Re: AHCI timeouts on S3 resume
On Tue, May 18, 2010 at 10:14:03PM -0400, Damian Gerow wrote: A few months back, I swapped out my dying hard drive for a WD Scorpio Blue. Cheap, seemed reliable, and it was the only drive the local shop had in stock. However, it seems that AHCI doesn't like this device, and is having troubles during an S3 resume. It appears as though I'm experiencing two types of timeouts when resuming: recoverable, and non-recoverable. My question is: do I have a bad HDD, or is AHCI just not playing nicely? Your hard disk looks generally OK; it isn't going bad. The one thing I can't tell or not is whether the disk is actually spinning back up on resume; you'd have to literally listen for it, or look at SMART Attribute #4 before and after a suspend/resume. I'll discuss analysis of SMART statistics further down. The error messages you see coming from the AHCI driver indicate, to me, one of three things: 1) The ICH9 controller being stuck (possibly resume does something incorrectly to the controller), 2) FreeBSD not doing something quite right when coming out of suspend mode, or 3) the disk never waking up. If I had to take a guess, I'd say #2. mav@ might be able to help determine if something is being done incorrectly in the AHCI driver after resume. If the driver is doing the Right Thing(tm), then the next thing to do would be to discuss the problem on freebsd-a...@. I can't help with these things. I will point out, however, that you've set this value in loader.conf: hw.pci.do_power_nodriver=2 I've read the sysctl -d description for it, but I am not familiar with sleep/power states so I don't know the implications. I worry that this value may be causing problems with your ICH9 controller. If you could comment this out and re-try suspend/resume to see if AHCI times out, you might determine if it's responsible for the problem. The HDD is a WD Scorpio blue, model WD5000BEVT-22A0RT0, and isn't exactly the fastest drive on the planet. SMART seems to be relatively clean, with some mild questions surrounding attributes 191, 9/193, and 194: - ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time0x0027 186 185 021Pre-fail Always - 1675 4 Start_Stop_Count0x0032 055 055 000Old_age Always - 45174 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 723 191 G-Sense_Error_Rate 0x0032 072 072 000Old_age Always - 28 193 Load_Cycle_Count0x0032 162 162 000Old_age Always - 115712 194 Temperature_Celsius 0x0022 112 106 000Old_age Always - 35 - Attribute #3 indicates the total amount of time it takes for the drive to spin up (usually in milliseconds). I'll point out that there are drives out there (such as the WD Caviar Black) which report ~8s spin-up times when powered on; this is normal. The drive is actually able to function during the spin-up, which is why those systems don't take a full 8 seconds before they're able to read from the HD. I wanted to point out this attribute because you've brought up concerns over AHCI 15 second timeouts being hit. Attribute #4 indicates the number of times the disk has been told by the controller to spin up or spin down. This counter should increase when your laptop goes in/out of suspend/resume. I wanted to point out this attribute because of what I said in my first paragraph. Attribute #9 indicates the total amount of time the hard disk has been powered on (read: not asleep) during its lifetime. I can't tell you whether or not this value is correct; only you would be able to determine that, given your usage patterns. I *have* seen desktop drives which have reported this value incorrectly (meaning, servers I know have been on for thousands of hours that show 4 for this RAW_VALUE; probably a firmware bug). Attribute #191 indicates a *rate* of G-shock events. The drive has a G-shock sensor inside of it. This value being non-zero is perfectly fine for laptops; people have a tendency to walk around with their systems on, tilt them sideways, place them on the desk firmly, etc.. The sensor is sensitive, and it isn't intended to detect severity of shock (e.g. throwing your laptop across the room); it's intended to measure a rate. The RAW_VALUE doesn't mean anything to me; 48 what? We don't know. Only WD knows if that's a safe value or not. So what do we do in this case? We look at the adjusted value VALUE and compare it to WORST and THRESH. SMART disk failure won't get triggered until VALUE reaches 000, so 162 is pretty good. I'd say don't worry about it. (I'll use this opportunity to point out to readers that this is why looking at RAW_VALUE explicitly is not always the correct way to read SMART). Attribute #193 indicates the number of times the actuator arm (thus heads) has been
network probs rxcsum
Hi, I have two machines running FreeBSD amd64 8.0-Stable with custom kernels. My newer box has had troubles with ssh from day one. I hoped a kernel upgrade would help, but it didn't. When I'd ssh into the box ssh would exit with errors: Bad packet length xx Disconnecting: Packet corrupt. after issueing: ifconfig em0 -rxcons everything was stable again. First I figured it'd be a driver issue. However, I use the same NIC in my other box! What could be causing this problem? signature.asc Description: OpenPGP digital signature
Re: network probs rxcsum
On Wed, May 19, 2010 at 12:34:17PM +0200, Mark Stapper wrote: I have two machines running FreeBSD amd64 8.0-Stable with custom kernels. My newer box has had troubles with ssh from day one. I hoped a kernel upgrade would help, but it didn't. When I'd ssh into the box ssh would exit with errors: Bad packet length xx Disconnecting: Packet corrupt. after issueing: ifconfig em0 -rxcons everything was stable again. First I figured it'd be a driver issue. However, I use the same NIC in my other box! What could be causing this problem? I think you mean -rxcsum, not -rxcons. Could you please provide output from the following commands? Jack Vogel will probably respond later about this, but said output would help him. - uname -a - dmesg | grep em0 - pciconf -lvc Thanks. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: network probs rxcsum
On 19/05/2010 12:44, Jeremy Chadwick wrote: On Wed, May 19, 2010 at 12:34:17PM +0200, Mark Stapper wrote: I have two machines running FreeBSD amd64 8.0-Stable with custom kernels. My newer box has had troubles with ssh from day one. I hoped a kernel upgrade would help, but it didn't. When I'd ssh into the box ssh would exit with errors: Bad packet length xx Disconnecting: Packet corrupt. after issueing: ifconfig em0 -rxcons everything was stable again. First I figured it'd be a driver issue. However, I use the same NIC in my other box! What could be causing this problem? I think you mean -rxcsum, not -rxcons. Could you please provide output from the following commands? Jack Vogel will probably respond later about this, but said output would help him. - uname -a - dmesg | grep em0 - pciconf -lvc Thanks. Well, yes... something got garbled in my mind... I'll provide the outputs when I get home as the network connection just went down for no particilar reason... Greets, Mark signature.asc Description: OpenPGP digital signature
Re: AHCI timeouts on S3 resume
Jeremy Chadwick wrote: : On Tue, May 18, 2010 at 10:14:03PM -0400, Damian Gerow wrote: : A few months back, I swapped out my dying hard drive for a WD Scorpio Blue. : Cheap, seemed reliable, and it was the only drive the local shop had in : stock. However, it seems that AHCI doesn't like this device, and is having : troubles during an S3 resume. It appears as though I'm experiencing two : types of timeouts when resuming: recoverable, and non-recoverable. : : My question is: do I have a bad HDD, or is AHCI just not playing nicely? : : Your hard disk looks generally OK; it isn't going bad. The one thing I : can't tell or not is whether the disk is actually spinning back up on : resume; you'd have to literally listen for it, or look at SMART : Attribute #4 before and after a suspend/resume. I'll discuss analysis : of SMART statistics further down. The disk spins back up immediately on resume. I have no recollection of it /not/ doing so (it's definitely noticable), and I just confirmed it with a few S3 cycles. I also checked the WD spec sheet, and the average drive ready time is 4s. : I will point out, however, that you've set this value in loader.conf: : : hw.pci.do_power_nodriver=2 : : I've read the sysctl -d description for it, but I am not familiar with : sleep/power states so I don't know the implications. I worry that this : value may be causing problems with your ICH9 controller. If you could : comment this out and re-try suspend/resume to see if AHCI times out, you : might determine if it's responsible for the problem. That *should* just remove power from devices without a driver. But I removed it, rebooted, went through two S3 cycles, and I'm still seeing the timeouts. (Recoverable; of the two cycles I did, I didn't see a non-recoverable timeout.) : The HDD is a WD Scorpio blue, model WD5000BEVT-22A0RT0, and isn't exactly : the fastest drive on the planet. SMART seems to be relatively clean, with : some mild questions surrounding attributes 191, 9/193, and 194: : : - : ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE :3 Spin_Up_Time0x0027 186 185 021Pre-fail Always - 1675 :4 Start_Stop_Count0x0032 055 055 000Old_age Always - 45174 :9 Power_On_Hours 0x0032 100 100 000Old_age Always - 723 : 191 G-Sense_Error_Rate 0x0032 072 072 000Old_age Always - 28 : 193 Load_Cycle_Count0x0032 162 162 000Old_age Always - 115712 : 194 Temperature_Celsius 0x0022 112 106 000Old_age Always - 35 : - : Attribute #9 indicates the total amount of time the hard disk has been : powered on (read: not asleep) during its lifetime. I can't tell you : whether or not this value is correct; only you would be able to : determine that, given your usage patterns. I *have* seen desktop drives : which have reported this value incorrectly (meaning, servers I know have : been on for thousands of hours that show 4 for this RAW_VALUE; : probably a firmware bug). I combined attributes 9 and 193 together because it seems like a load cycle count of ~116k with 723 power-on hours is a bit high. I believe laptop HDDs are designed to handle a higher rate of load cycle counts, but I've never really paid attention to them -- save on my previously dying drive, which had broken 1M, and started screeching when doing some seeks. But yes, that 723 power-on hours seems accurate. : Attribute #193 indicates the number of times the actuator arm (thus : heads) has been parked or come out of being parked. There is a known : problem with some models of WD Green Power (GP) drives where the drive : spends an excessive amount of time parking, and this counter increases : rapidly. One FreeBSD user who reported this problem to Western Digital : received a replacement firmware which addressed the problem. The WD : Scorpio Blue drives (or some of them) may have this same problem -- : HOWEVER, this model of hard disk (2.5 FF) is *specifically* intended : for laptops and low-power environments, so the behaviour seen in this : case could be 100% normal. WD would hopefully know. I'm fairly certain that WD only includes that IntelliPark feature on the GP drives. At least, WD doesn't indicate that there's any of their fancy new GP-related tricks on the Scorpio Blue line. I'd actually recently dropped my vfs.zfs.txg.timeout to 5, as I was experiencing some pretty horrible stalls when it was left at default (30, I believe). I was curious to see if this decreased the rate of my Load_Cycle_Count, but I'm already at ~122k. Given that this drive is rated to handle 600k, it makes me wonder if there isn't something like IntelliPark on this drive. : Hope this helps. Aye. It confirms that SMART clears my drive -- thanks! ___
7-stable compile broken: kern_ntptime
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 After the recent kern_ntptime updates: cc -c -O2 -pipe -fno-strict-aliasing -march=pentium4 -std=c99 -Wall - -Wredundant-decls -Wnested-externs -Wstrict-prototypes - -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef - -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/usr/src/sys - -I/usr/src/sys/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS - -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 - -fno-omit-frame-pointer -mno-align-long-strings - -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 - -mno-sse3 -ffreestanding -Werror /usr/src/sys/kern/kern_ntptime.c cc1: warnings being treated as errors /usr/src/sys/kern/kern_ntptime.c: In function 'periodic_resettodr': /usr/src/sys/kern/kern_ntptime.c:985: warning: implicit declaration of function 'resettodr' /usr/src/sys/kern/kern_ntptime.c:985: warning: nested extern declaration of 'resettodr' /usr/src/sys/kern/kern_ntptime.c:989: warning: implicit declaration of function 'callout_schedule' /usr/src/sys/kern/kern_ntptime.c:989: warning: nested extern declaration of 'callout_schedule' *** Error code 1 Stop in /usr/obj/usr/src/sys/AUBURN. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. imb -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkv0AC8ACgkQQv9rrgRC1JJs/QCgpVIUSKua6RaVH1Ch16BEixao CNQAoJ59A4isvuVms6jHuSaW28p/ubD4 =GRs2 -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7-stable compile broken: kern_ntptime
On Wed, May 19, 2010 at 11:13:51AM -0400, Michael Butler wrote: After the recent kern_ntptime updates: {snip CC warnings} The problem was addressed 6 minutes ago. You'll need to wait for the cvsup mirrors to pick up the change, otherwise use cvsup-master.freebsd.org (not recommended). CVS commit: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_ntptime.c -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: network probs rxcsum
On 05/19/10 12:44, Jeremy Chadwick wrote: On Wed, May 19, 2010 at 12:34:17PM +0200, Mark Stapper wrote: I have two machines running FreeBSD amd64 8.0-Stable with custom kernels. My newer box has had troubles with ssh from day one. I hoped a kernel upgrade would help, but it didn't. When I'd ssh into the box ssh would exit with errors: Bad packet length xx Disconnecting: Packet corrupt. after issueing: ifconfig em0 -rxcons everything was stable again. First I figured it'd be a driver issue. However, I use the same NIC in my other box! What could be causing this problem? I think you mean -rxcsum, not -rxcons. Could you please provide output from the following commands? Jack Vogel will probably respond later about this, but said output would help him. - uname -a - dmesg | grep em0 - pciconf -lvc Thanks. Could it be a shared interrupt problem? Even though ssh worked with rxcsup disabled, network performance was horrible! Using my onboard nick in stead of em0 cleared it right up! em0 is a pci addon card. Here are the outputs you requested: [r...@mario ~]# uname -a FreeBSD mario 8.0-STABLE FreeBSD 8.0-STABLE #0: Tue May 18 19:37:30 CEST 2010 root@:/usr/obj/usr/src/sys/mario amd64 [r...@mario ~]# dmesg |grep em0 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 port 0x9c00-0x9c3f mem 0xfdfa-0xfdfb,0xfdfc-0xfdfd irq 18 at device 6.0 on pci2 em0: [FILTER] em0: Ethernet address: 00:1b:21:4b:8b:85 em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN [r...@mario ~]# pciconf -lvc no...@pci0:0:0:0: class=0x05 card=0x02f010de chip=0x02f410de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Host Bridge' class = memory subclass = RAM cap 08[44] = HT slave cap 08[e0] = HT MSI address window disabled at 0xfee0 no...@pci0:0:0:1: class=0x05 card=0x02fa10de chip=0x02fa10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 0' class = memory subclass = RAM no...@pci0:0:0:2: class=0x05 card=0x02fe10de chip=0x02fe10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 1' class = memory subclass = RAM no...@pci0:0:0:3: class=0x05 card=0x02f810de chip=0x02f810de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 5' class = memory subclass = RAM no...@pci0:0:0:4: class=0x05 card=0x02f910de chip=0x02f910de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 4' class = memory subclass = RAM no...@pci0:0:0:5: class=0x05 card=0x02ff10de chip=0x02ff10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Host Bridge' class = memory subclass = RAM cap 00[44] = unknown no...@pci0:0:0:6: class=0x05 card=0x027f10de chip=0x027f10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 3' class = memory subclass = RAM no...@pci0:0:0:7: class=0x05 card=0x027e10de chip=0x027e10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 2' class = memory subclass = RAM pc...@pci0:0:4:0: class=0x060400 card=0x10de chip=0x02fb10de rev=0xa1 hdr=0x01 vendor = 'NVIDIA Corporation' device = 'C51 PCIe Bridge' class = bridge subclass = PCI-PCI cap 0d[40] = PCI Bridge card=0x10de cap 01[48] = powerspec 2 supports D0 D3 current D0 cap 05[50] = MSI supports 2 messages, 64 bit cap 08[60] = HT MSI address window disabled at 0xfee0 cap 10[80] = PCI-Express 1 root port max data 128(128) link x16(x16) no...@pci0:0:8:0: class=0x05 card=0xcb8410de chip=0x036910de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'MCP55 Memory Controller' class = memory subclass = RAM cap 08[44] = HT slave cap 08[dc] = HT MSI address window enabled at 0xfee0 is...@pci0:0:9:0: class=0x060100 card=0xcb8410de chip=0x036010de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'MCP55 LPC Bridge' class = bridge subclass = PCI-ISA no...@pci0:0:9:1: class=0x0c0500 card=0xcb8410de chip=0x036810de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'SMBus controller ((0xCB84 integrated chip nForce Pro 3400))' class = serial bus subclass = SMBus cap 01[44] = powerspec 2 supports D0 D3 current D0 non...@pci0:0:9:3: class=0x0b4000
Re: network probs rxcsum
vmstat -i ? Custom kernel? If you use stock kernel do you still see this problem? If you use 8 RELEASE do you see the problem? Jack On Wed, May 19, 2010 at 11:06 AM, Mark Stapper st...@mapper.nl wrote: On 05/19/10 12:44, Jeremy Chadwick wrote: On Wed, May 19, 2010 at 12:34:17PM +0200, Mark Stapper wrote: I have two machines running FreeBSD amd64 8.0-Stable with custom kernels. My newer box has had troubles with ssh from day one. I hoped a kernel upgrade would help, but it didn't. When I'd ssh into the box ssh would exit with errors: Bad packet length xx Disconnecting: Packet corrupt. after issueing: ifconfig em0 -rxcons everything was stable again. First I figured it'd be a driver issue. However, I use the same NIC in my other box! What could be causing this problem? I think you mean -rxcsum, not -rxcons. Could you please provide output from the following commands? Jack Vogel will probably respond later about this, but said output would help him. - uname -a - dmesg | grep em0 - pciconf -lvc Thanks. Could it be a shared interrupt problem? Even though ssh worked with rxcsup disabled, network performance was horrible! Using my onboard nick in stead of em0 cleared it right up! em0 is a pci addon card. Here are the outputs you requested: [r...@mario ~]# uname -a FreeBSD mario 8.0-STABLE FreeBSD 8.0-STABLE #0: Tue May 18 19:37:30 CEST 2010 root@:/usr/obj/usr/src/sys/mario amd64 [r...@mario ~]# dmesg |grep em0 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 port 0x9c00-0x9c3f mem 0xfdfa-0xfdfb,0xfdfc-0xfdfd irq 18 at device 6.0 on pci2 em0: [FILTER] em0: Ethernet address: 00:1b:21:4b:8b:85 em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN [r...@mario ~]# pciconf -lvc no...@pci0:0:0:0: class=0x05 card=0x02f010de chip=0x02f410de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Host Bridge' class = memory subclass = RAM cap 08[44] = HT slave cap 08[e0] = HT MSI address window disabled at 0xfee0 no...@pci0:0:0:1: class=0x05 card=0x02fa10de chip=0x02fa10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 0' class = memory subclass = RAM no...@pci0:0:0:2: class=0x05 card=0x02fe10de chip=0x02fe10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 1' class = memory subclass = RAM no...@pci0:0:0:3: class=0x05 card=0x02f810de chip=0x02f810de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 5' class = memory subclass = RAM no...@pci0:0:0:4: class=0x05 card=0x02f910de chip=0x02f910de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 4' class = memory subclass = RAM no...@pci0:0:0:5: class=0x05 card=0x02ff10de chip=0x02ff10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Host Bridge' class = memory subclass = RAM cap 00[44] = unknown no...@pci0:0:0:6: class=0x05 card=0x027f10de chip=0x027f10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 3' class = memory subclass = RAM no...@pci0:0:0:7: class=0x05 card=0x027e10de chip=0x027e10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'C51 Memory Controller 2' class = memory subclass = RAM pc...@pci0:0:4:0: class=0x060400 card=0x10de chip=0x02fb10de rev=0xa1 hdr=0x01 vendor = 'NVIDIA Corporation' device = 'C51 PCIe Bridge' class = bridge subclass = PCI-PCI cap 0d[40] = PCI Bridge card=0x10de cap 01[48] = powerspec 2 supports D0 D3 current D0 cap 05[50] = MSI supports 2 messages, 64 bit cap 08[60] = HT MSI address window disabled at 0xfee0 cap 10[80] = PCI-Express 1 root port max data 128(128) link x16(x16) no...@pci0:0:8:0: class=0x05 card=0xcb8410de chip=0x036910de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'MCP55 Memory Controller' class = memory subclass = RAM cap 08[44] = HT slave cap 08[dc] = HT MSI address window enabled at 0xfee0 is...@pci0:0:9:0: class=0x060100 card=0xcb8410de chip=0x036010de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'MCP55 LPC Bridge' class = bridge subclass = PCI-ISA no...@pci0:0:9:1: class=0x0c0500 card=0xcb8410de chip=0x036810de rev=0xa2
Re: Kernel panic when unpluggin AC adaptor
On Tue, May 18, 2010 at 10:47 PM, Brandon Gooch jamesbrandongo...@gmail.com wrote: On Tue, May 18, 2010 at 9:04 AM, Giovanni Trematerra giovanni.tremate...@gmail.com wrote: On Sat, May 15, 2010 at 9:12 PM, Brandon Gooch jamesbrandongo...@gmail.com wrote: On Thu, May 13, 2010 at 7:25 PM, Giovanni Trematerra giovanni.tremate...@gmail.com wrote: On Thu, May 13, 2010 at 1:09 AM, Brandon Gooch jamesbrandongo...@gmail.com wrote: On Wed, May 12, 2010 at 9:41 AM, Attilio Rao atti...@freebsd.org wrote: 2010/5/12 David DEMELIER demelier.da...@gmail.com: I remove the patch, and built the kernel (I updated the src this morning) and it does not panic now. It's really odd. If it reappears soon I will tell you. I looked at the code with Giovanni and I have the feeling that the race with the idle thread may still be fatal. We need to fix that. Attilio That seems to be the case, as my laptop shows about an 80-85 % chance of experiencing a panic if left idle for long-ish periods of time (2 to 4 hours). I usually rebuild world or big ports overnight, and more often than not I wake up to a panicked machine, same situation every time: ... rman_get_bushandle() at rman_get_bushandle+0x1 sched_idletd() at sched_idletd+0x123 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe ... The kernel/userland is rebuilt, the ports are finished compiling -- it's in the time AFTER the completion of all tasks that the machine gets bored and tries to kill itself :) I have seen the AC adapter plug/unplug hang in the past on this laptop, but I never made the connection between the events, as nowadays my laptop usually stays plugged in :( Attilio, I hope you can track this one down, let me know if I can do anything to help or test... Attilio and I came up with this patch. It seems ready for stress testing and review Please test and report back. Thank you P.S: all the faults are only mine. I tried the patch, and my kernel panics I panic on boot. I have 8.5MB(!) of JPG images (6 of them) if anyone needs to see them. I'm looking for a place to post them, but if anyone wants, I can send via e-mail... Hi Brandon, Could you please, try this new one? The panic at boot stage should be solved, at least I tried on a 8-way machine and all went ok at boot. Please, remove WITNESS_SKIPSPIN from your kernel config file. This patch might be sub-optimal and contains style(9) error but if it works we are on the right way. Let me know if it works for you. Applied the patch, built, installed, and booted new kernel: no panic! I will remove WITNESS_SKIPSPIN and build another kernel. Then I'll try to trigger the panic (by letting my laptop sit idle after a buildworld session). Thanks for giving this some attention, I hope you and/or others are able to get to the bottom of this... Hey everyone, just reporting in: The laptop has experienced the longest uptime it's seen in a while -- so far, so good! I'll keep the machine up and running just in case... -Brandon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org