Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Jeremy Chadwick wrote: On Fri, Jan 25, 2008 at 06:17:24PM -0700, Joe Peterson wrote: Glad you got it back! Yes, when I was first playing with ZFS, I noticed that booting between single and multi user mode could make the pools invisible. Import seemed to bring them back... I did go into single-user mode and attempt to do ZFS-related commands, which might explain the no datasets available once I was back in multiuser! I would classify that as a bug, and one which is going to cause all sorts of hair-pulling for administrators in the future. I wonder what it's caused by. In single user / is read only and so /boot/zfs/zpool.cache can't be created/updated Henri The import technique I found on a forum somewhere, or possibly on a Solaris mailing list. I was really sweating there for a moment... So, is the disk toast, or can you still read anything from it (part table, etc.)? The ad6 disk (/backups) fsck'd cleanly without any missing files or anomalies. The ZFS pool that has two striped disks (ad8 and ad10) is fully intact too, with no loss of data that I can see. I'll have to run a scrub after I'm done copying data over to ad6, just to make sure though. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Well-supported SAS RAID card for 6.3?
Josh Endries wrote: Hello, I'm buying a new server and will put 6.3 on it and would like to use SAS; normally I use 3ware SATA. I've been reading a lot including man pages but can't seem to find definitive information on SAS cards that are well-supported and work well. I've found reports that cards listed in man pages don't seem to work, and confusion about chips/cards. Does anyone have experience with recent SAS cards or machines with integrated chips? LSI, Highpoint, Areca, 3ware, and Adaptec are all well supported in FreeBSD. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
dmesg : no output on 1 of 2 7-stable boxes
One of 2 laptops running 7-stable shows nothing with dmesg, (other is OK). Needs fresh eyes please, as I've already checked all this: - Text is in /var/log/messages, readable to normal users. - df shows enough disc - I patched out all of loader.conf, except essential hw.ata.ata_dma=0 (that was to remove any *verbose* that might be overflowing - I tried loader.confkern.msgbuf=64000 - Tried a Generic kernel - I've recompiled installed dmesg kernels - Haven't done a make world (slow box), but repeated make all ; make install with each day's new source. - Both hosts running same /usr/src from yesterday: /pub/FreeBSD/development/CTM/src-7/src-7.0102.gz Jan 25 15:25 TZ=GMT+01:00 - cd /etc;grep dmesg * rc.conf:dmesg_enable=YES # Save dmesg(8) to /var/run/dmesg.boot Binary file rc.d matches# Just the standard /etc/rc.d/dmesg rc.local:/sbin/dmesg /tmp/dmesg.rc.local # -rw-r--r-- 1 root wheel 0 Jan 25 14:01 /tmp/dmesg.rc.local - mergemaster -sicvP Any ideas please ? Julian -- Julian Stacey. Munich Consultant: BSD Linux Unix. http://berklix.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
I performed a ZFS scrub, which finished yesterday, and no new /var/log/messages errors were reported during that time. However, the scrub found something interesting: crater# zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 1 errors on Fri Jan 25 12:52:32 2008 config: NAMESTATE READ WRITE CKSUM tankONLINE 1 3 2 ad0s1dONLINE 1 3 2 errors: Permanent errors have been detected in the following files: /home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_ Bachelor_Pad/07-Snowfall.mp3 Note that I have not touched this file since copying it to this drive. So, it seems one file failed a checksum check during the scrub. I now (expectedly) get errors trying to read this file - probably ZFS indicating the condition. When I just logged in tonight, I got two more /var/log/messages disk messages about WRITE_DMA48 TIMEOUT/FAILURE - might be a coincidence (just as I was typing my password). Also, smartctl still shows PASSED, however, this is interesting: 195 Hardware_ECC_Recovered 0x001a 061 046 000Old_age Always - 9070 The number is much *smaller* now! It was 6 a few minutes before this... wrap around? Hmm, I'm really not sure, at this point, what is going on. So I have started a SeaTools (disk scanner from Seagate) long test of the drive. The short test passed already. The results should be interesting. If it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS bugs that just happen to look like drive problems. I already did a long read, under linux, of disk contents, and got no messages about anything wrong. If I can turn on any debugging info to help determine if this is software-related, let me know the magic keywords to use. :) -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Joe Peterson wrote: So I have started a SeaTools (disk scanner from Seagate) long test of the drive. The short test passed already. The results should be interesting. If it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS bugs that just happen to look like drive problems. I already did a long read, under linux, of disk contents, and got no messages about anything wrong. Update: both SHORT and LONG tests passed for this drive in SeaTools. Hmph... the mystery remains. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Well-supported SAS RAID card for 6.3?
Hi Scott, Scott Long wrote: LSI, Highpoint, Areca, 3ware, and Adaptec are all well supported in FreeBSD. Are they? I don't see any reference to the LSI8708, LSI or LSI1068 in the man pages I can find...does anyone use these? Some people have problems with the PERC 6/i (which I think is an LSI), which makes me wonder. 3ware says their SAS driver is in beta...I dunno, I'm not trying to be confrontational, maybe I'm just overly skeptical or paranoid or something. I'm leaning towards 3ware right now, but still looking for success/failure stories. Thanks for the responses! J ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Henri Hennebert wrote: Jeremy Chadwick wrote: On Fri, Jan 25, 2008 at 06:17:24PM -0700, Joe Peterson wrote: Glad you got it back! Yes, when I was first playing with ZFS, I noticed that booting between single and multi user mode could make the pools invisible. Import seemed to bring them back... I did go into single-user mode and attempt to do ZFS-related commands, which might explain the no datasets available once I was back in multiuser! I would classify that as a bug, and one which is going to cause all sorts of hair-pulling for administrators in the future. I wonder what it's caused by. In single user / is read only and so /boot/zfs/zpool.cache can't be created/updated But it's still readable. The issue is that hostid isn't set (by /etc/rc.d/hostid). signature.asc Description: OpenPGP digital signature
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Joe Peterson wrote: Joe Peterson wrote: So I have started a SeaTools (disk scanner from Seagate) long test of the drive. The short test passed already. The results should be interesting. If it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS bugs that just happen to look like drive problems. I already did a long read, under linux, of disk contents, and got no messages about anything wrong. Update: both SHORT and LONG tests passed for this drive in SeaTools. Hmph... the mystery remains. Were both tests done in the same machine (actually, I mean the same PSU)? signature.asc Description: OpenPGP digital signature
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Joe Peterson [EMAIL PROTECTED] writes: Glad you got it back! Yes, when I was first playing with ZFS, I noticed that booting between single and multi user mode could make the pools invisible. Import seemed to bring them back... Yeah. ZFS pools record the hostid of the system that accessed them last. When you boot in single-user mode, /etc/rc.d/hostid doesn't get run, so the hostid is zero, which doesn't match the hostid in the pool, so the pool doesn't show up without an import. Workaround: always make sure you run /etc/rc.d/hostid start in single-user before doing any ZFS tinkering. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
After upgrading from RELENG_6_2 to 7.0-RC1 I am experiencing system becoming unresponsive during intensive disk operations. Only solution is a power off. These hangs occured pretty much within a few hours of first running 7.0-RC1. 6.2R was fine. I am not running ZFS. The hangs are easy to reproduce by an unrar of an archive ~4.4GB. The system will not hang during normal operations. Below are the errors I get, output from dmesg, atacontrol cap and smartctl tests. (The long test is aborted below, but as indicated previously completed without errors) These are the errors: --- Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SET_MULTI taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2415 Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SET_MULTI taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2543 --- and so on... This computer has only the one SATA hard drive. dmesg info: atapci1: VIA 8237A SATA150 controller port 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc807,0xc480-0xc483,0xc400-0xc40f,0xc000-0xc0ff irq 21 at device 15.0 on pci0 atapci1: [ITHREAD] ata4: ATA channel 0 on atapci1 ad8: 238475MB SAMSUNG SP2504C VT100-50 at ata4-master SATA150 --- atacontrol cap ad8 Protocol Serial ATA II device model SAMSUNG SP2504C serial number S09QJ1CP201268 firmware revision VT100-50 cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 488397168 sectors dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no 0/0x00 automatic acoustic management yes no 0/0x00 254/0xFE --- smartctl -a /dev/ad8: === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint P120 series Device Model: SAMSUNG SP2504C Serial Number:S09QJ1CP201268 Firmware Version: VT100-50 User Capacity:250 059 350 016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is:Sat Jan 26 22:44:43 2008 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 25) The self-test routine was aborted by the host. Total time to complete Offline data collection: (5028) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Same here. On an amd64 system with 1x sata disk (Western Digital Caviar Green Power) on an amd690G chipset, with UFS and intensive disk activity the system hangs and in the end it may panic. I've csupped today and rebuild world generic kernel but still it's very unstable, sometimes it even hangs when activating geom volumes at boot time... I must add that this is a new system so I'm not 100% sure the hardware is sane. Using ZFS it also crashed when doing intensive I/O. I can supply additional info later if that may help. Cheers, Remco On Sat, Jan 26, 2008 at 10:54:17PM +0100, Nikolaj Farrell wrote: After upgrading from RELENG_6_2 to 7.0-RC1 I am experiencing system becoming unresponsive during intensive disk operations. Only solution is a power off. These hangs occured pretty much within a few hours of first running 7.0-RC1. 6.2R was fine. I am not running ZFS. The hangs are easy to reproduce by an unrar of an archive ~4.4GB. The system will not hang during normal operations. Below are the errors I get, output from dmesg, atacontrol cap and smartctl tests. (The long test is aborted below, but as indicated previously completed without errors) These are the errors: --- Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SET_MULTI taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2415 Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: WARNING - SET_MULTI taskqueue timeout - completing request directly Jan 26 19:55:36 athlon kernel: ad8: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2543 --- and so on... This computer has only the one SATA hard drive. dmesg info: atapci1: VIA 8237A SATA150 controller port 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc807,0xc480-0xc483,0xc400-0xc40f,0xc000-0xc0ff irq 21 at device 15.0 on pci0 atapci1: [ITHREAD] ata4: ATA channel 0 on atapci1 ad8: 238475MB SAMSUNG SP2504C VT100-50 at ata4-master SATA150 --- atacontrol cap ad8 Protocol Serial ATA II device model SAMSUNG SP2504C serial number S09QJ1CP201268 firmware revision VT100-50 cylinders 16383 heads 16 sectors/track 63 lba supported 268435455 sectors lba48 supported 488397168 sectors dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no 0/0x00 automatic acoustic management yes no 0/0x00 254/0xFE --- smartctl -a /dev/ad8: === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint P120 series Device Model: SAMSUNG SP2504C Serial Number:S09QJ1CP201268 Firmware Version: VT100-50 User Capacity:250 059 350 016 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is:Sat Jan 26 22:44:43 2008 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 25) The self-test routine was aborted by the host. Total time to complete Offline data collection: (5028) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
Ivan Voras wrote: Were both tests done in the same machine (actually, I mean the same PSU)? Yes - I deliberately changed nothing (not even cables) before I ran the tests. I didn't want any variables. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Remco van Bekkum wrote: Same here. On an amd64 system with 1x sata disk (Western Digital Caviar Green Power) on an amd690G chipset, with UFS and intensive disk activity the system hangs and in the end it may panic. I've csupped today and rebuild world generic kernel but still it's very unstable, sometimes it even hangs when activating geom volumes at boot time... I must add that this is a new system so I'm not 100% sure the hardware is sane. Using ZFS it also crashed when doing intensive I/O. This is very interesting. It seems to there are several of us who are experiencing something that *looks* like hardware (disk) issues when using 7.0. Could this be related to the mouse freeze issue? Could some process be locking/grabbing the CPU at inopportune times and causing not only the freezing symptoms but also reads/writes problems? Can anyone else using 7.0 who hasn't already (especially those using ZFS) check his/her /var/log/messages for disk TIMEOUTs or other disk error messages? If this is widespread, I think the chances re slim that it is a hardware problem in every case. -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Can anyone else using 7.0 who hasn't already (especially those using ZFS) check his/her /var/log/messages for disk TIMEOUTs or other disk error messages? If this is widespread, I think the chances re slim that it is a hardware problem in every case. I noticed this week that I was getting DMA WRITE timeouts on two of my disks on my 7.0-RC1 box. The strange thing in my case was that only 2 of my 3 drives were exhibiting the behavior. The two having the problem were both SATA300 drives, while the third was SATA150. I jumpered the two bad drives to SATA150, and the timeouts went away. I replaced the motherboard, set the two drives back to SATA300 and have not had a timeout since. I was thinking maybe it was related, but in my case it was truly a hardware failure. Here were the messages, for what it's worth: Jan 23 10:05:01 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=1525567 Jan 23 10:05:09 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3828159 Jan 23 10:07:27 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3450111 Jan 23 10:07:33 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4561983 Jan 23 11:30:27 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4178303 Jan 23 11:30:33 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=5660287 Jan 23 11:30:39 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=6805951 Jan 23 11:30:48 pflog kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=9856959 Josh ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Unionfs on RELENG_6
Now that the 6.3 release notes advertise its reimplementation, isn't it safe to remove the warning at the end of the mount_unionfs(8) ? -- Mahnahmahnah! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
./iicbus_if.h: In function `IICBUS_TRANSFER':
Hi While I was being update another server which is 6.2 stable. make ./iicbus_if.h:124: warning: struct iic_msg declared inside parameter list ./iicbus_if.h:124: warning: its scope is only this definition or declaration, which is probably not what you want ./iicbus_if.h:127: warning: struct iic_msg declared inside parameter list ./iicbus_if.h: In function `IICBUS_TRANSFER': ./iicbus_if.h:131: warning: passing arg 2 of pointer to function from incompatible pointer type *** Error code 1 Stop in /usr/src/sys/modules/i2c/if_ic. *** Error code 1 Stop in /usr/src/sys/modules/i2c. *** Error code 1 Stop in /usr/src/sys/modules. *** Error code 1 : What is this? What should I do, -- Share now a pigeon's flight Bluebound along the ancient skies, Its women forever hair and mammal, A Mediterranean town may arise If you rip apart a pigeon's heart. -- Share now a pigeon's flight Bluebound along the ancient skies, Its women forever hair and mammal, A Mediterranean town may arise If you rip apart a pigeon's heart. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad0: TIMEOUT - WRITE_DMA type errors with 7.0-RC1
On Sat, Jan 26, 2008 at 01:15:31PM -0700, Joe Peterson wrote: Joe Peterson wrote: So I have started a SeaTools (disk scanner from Seagate) long test of the drive. The short test passed already. The results should be interesting. If it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS bugs that just happen to look like drive problems. I already did a long read, under linux, of disk contents, and got no messages about anything wrong. Update: both SHORT and LONG tests passed for this drive in SeaTools. Hmph... the mystery remains. As do mine -- I also completed both short and long tests in SeaTools on my drive (finished early this evening). Absolutely no errors, everything passed. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
On Sat, Jan 26, 2008 at 04:28:29PM -0700, Joe Peterson wrote: Remco van Bekkum wrote: Same here. On an amd64 system with 1x sata disk (Western Digital Caviar Green Power) on an amd690G chipset, with UFS and intensive disk activity the system hangs and in the end it may panic. I've csupped today and rebuild world generic kernel but still it's very unstable, sometimes it even hangs when activating geom volumes at boot time... I must add that this is a new system so I'm not 100% sure the hardware is sane. Using ZFS it also crashed when doing intensive I/O. This is very interesting. It seems to there are several of us who are experiencing something that *looks* like hardware (disk) issues when using 7.0. We need Soren Schmidt and/or Xin Li to help with this situation. I really don't know what we can provide (other than hardware, which I am more than happy to donate). In my case, I was able to let the machine remain broken for 15 minutes or so, and it eventually panic'd. Of course due to PR 118255, it's becoming difficult to get a coredump. Could this be related to the mouse freeze issue? Could some process be locking/grabbing the CPU at inopportune times and causing not only the freezing symptoms but also reads/writes problems? I don't use a mouse on my systems, but what you've described is possible. I'm guessing some sort of loop in the kernel (or a driver) which holds the system down for too long. If this is widespread, I think the chances re slim that it is a hardware problem in every case. I'm in definite agreement here. I think it might be worthwhile to note what hardware we're all using, in case there's something similar between our systems (chipset, disk vendor, etc.). My system is as follows; timeouts were reported during an rsync of data from the ZFS stripe (ad8+ad10) to a UFS2 filesystem on ad6. System eventually panic'd after remaining deadlocked (while kernel messages about timeouts kept printing on the console for ad6 only) for 10-15 minutes. * MB: Supermicro PDSMI+ (Intel ICH7-based) * CPU: Intel Core 2 Duo E6600 * RAM: Corsair CM2X1024-6400 DDR2, 2GB * ad4: WD Caviar SE WD2000JD (boot/OS) * ad6: Seagate Barracuda 7200.10 ST3500630AS * ad8: WD Caviar SE16 WD5000AAKS (ZFS stripe) * ad10: WD Caviar SE16 WD5000AAKS (ZFS stripe) * All drives are hooked up to the ICH7. * SMART stats showed no problems on any of the drives before or after. * RELENG_7, i386, ULE scheduler. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ad8: TIMEOUT - WRITE_DMA errors UFS 7.0-RC1
Jeremy Chadwick wrote: If this is widespread, I think the chances re slim that it is a hardware problem in every case. I'm in definite agreement here. I think it might be worthwhile to note what hardware we're all using, in case there's something similar between our systems (chipset, disk vendor, etc.). My system is as follows; timeouts were reported during an rsync of data from the ZFS stripe (ad8+ad10) to a UFS2 filesystem on ad6. System eventually panic'd after remaining deadlocked (while kernel messages about timeouts kept printing on the console for ad6 only) for 10-15 minutes. * MB: Supermicro PDSMI+ (Intel ICH7-based) * CPU: Intel Core 2 Duo E6600 * RAM: Corsair CM2X1024-6400 DDR2, 2GB * ad4: WD Caviar SE WD2000JD (boot/OS) * ad6: Seagate Barracuda 7200.10 ST3500630AS * ad8: WD Caviar SE16 WD5000AAKS (ZFS stripe) * ad10: WD Caviar SE16 WD5000AAKS (ZFS stripe) * All drives are hooked up to the ICH7. * SMART stats showed no problems on any of the drives before or after. * RELENG_7, i386, ULE scheduler. Mine is as follows: * MB: Tyan Trinity S2099 * CPU: Pentium 4, 2.4GHz * RAM: Crucial DDR, ECC, CL2.5, Unbuffered 2GB (1/2 PC2100, 1/2 PC2700) * ad0: Seagate ST3500630A 3.AAE (1 UFS2 boot, 1 ZFS pool) * ad1: Seagate ST3160812A 3.AAH (not used by FreeBSD) * Intel ICH4 UDMA100 controller * ATI Radeon RV280 9250 * Intel PRO/1000 NIC * 7.0-RC1, i386, ULE scheduler -Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]