Re: Terrible disk performance with LSI / FreeBSD 9.2-RC1
To follow up on this issue, at one point the stats were down to this: extended device statistics device r/s w/skr/skw/s qlen svc_t %b da00.0 0.0 0.0 0.00 0.0 0 da10.0 0.0 0.0 0.00 0.0 0 da2 127.9 0.0 202.3 0.01 47.5 100 da3 125.9 0.0 189.3 0.01 43.1 97 da4 127.9 0.0 189.8 0.01 45.8 100 da5 128.9 0.0 206.3 0.00 42.5 99 da6 127.9 0.0 202.3 0.01 46.2 98 da70.0 249.7 0.0 334.2 10 39.5 100 At some point, I figured out that 125 random iops is pretty much the limit for 7200 RPM SATA drives. So mostly what we're looking at here is the resilver of a raidz2 is the pathological worst case. Lesson learned; raidz2 is just really not viable without some kind of sort on the resilver operations. Wish I understood ZFS well enough to do something about that, but research suggests the problem is non-trivial. :( There also seems to be a separate ZFS issue related to having a very large number of snapshots (e.g. hourly for several months on a couple of filesystems). Some combination of the OS updates we've been doing trying to get this machine to 9.2-RC1 and deleting a ton of snapshots. It would be nice to know which it was; I guess we'll find out in a few months. So it seems like the combination of these two issues is mostly what is/was plaguing us. Thanks! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Terrible disk performance with LSI / FreeBSD 9.2-RC1
On Wed, Aug 7, 2013 at 3:15 PM, James Gosnell wrote: > Maybe one of your drives is bad, so it's constantly doing error correction? Not according to SMART; all the drives report no problems. Also, all the drives seem to perform in lock-step for both reading and writing. E.g. when one drive in an array is failing, all the drives may be pulling the same # of reads, but the failing drive will often report 100% busy and/or multi-second svc_t's and the others will sit at 4% with 20msec svc_t's or similar. In this case, it's acting like the disks are all hugely overloaded. Except without even the high svc_t's I typically associate with overworking an array. The speeds do fluctuate. Last night it was down to 64k/sec reads per drive (about 15 reads/sec) and still reporting 90% busy on all drives. It feels like some sort of issue with the bus/controller/kernel/driver/ZFS that is affecting all the drives equally. Also, even ls takes forever (10-30 seconds for "ls -lh /") but when it eventually does finish, "time ls -lh /" reports: 0.02 real 0.00 user 0.00 sys Really not sure what to make of that. An attempt to do "ps axlww | fgrep ls" while the ls was running failed, because the ps hangs just as long as the ls. So it's like the system is just repeatedly putting anything that touches the disks on hold, even if all the data being requested is clearly in cache. (Even apparently loading the binary for /bin/ls or doing "ls -lh /" twice in a row.) Thanks! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Terrible disk performance with LSI / FreeBSD 9.2-RC1
We have a machine running 9.2-RC1 that's getting terrible disk I/O performance. Its performance has always been pretty bad, but it didn't really become clear how bad until we did a zpool replace on one of the drives and realized it was going to take 3 weeks to rebuild a <1TB drive. The hardware specs are: - 2 x Xeon L5420 - 32 GiB RAM - LSI Logic SAS 1068E - 2 x 32GB SSD's - 6 x 1TB Western Digital RE3 7200RPM SATA The LSI controller has the most recent firmware I'm aware of (6.36.00.00 / 1.33.00.00 dated 2011.08.24), is in IT mode, and appears to be working fine: mpt0 Adapter: Board Name: USASLP-L8i Board Assembly: USASLP-L8i Chip Name: C1068E Chip Revision: B3 RAID Levels: none mpt0 Configuration: 0 volumes, 8 drives drive da0 (30G) ONLINE SATA drive da1 (29G) ONLINE SATA drive da2 (931G) ONLINE SATA drive da3 (931G) ONLINE SATA drive da4 (931G) ONLINE SATA drive da5 (931G) ONLINE SATA drive da6 (931G) ONLINE SATA drive da7 (931G) ONLINE SATA The eight drives are configured as ZIL, L2ARC on SSD and a six drive raidz2 on the spinning disks. We did a ZFS replace on the last drive in the line, and the resilver is proceeding at less than 800k/sec. extended device statistics device r/s w/skr/skw/s qlen svc_t %b da00.0 0.0 0.0 0.10 0.9 0 da10.0 8.2 0.019.90 0.1 0 da2 125.6 23.0 768.240.54 33.0 88 da3 126.6 23.1 769.041.34 32.3 89 da4 126.0 24.0 768.542.74 32.1 88 da5 125.9 22.0 768.240.14 31.6 87 da6 124.0 22.0 766.639.95 31.4 84 da70.0 136.9 0.0 801.30 0.6 4 The system has plenty of free RAM, is 99.7% idle, has nothing else going on, and runs like a one-legged dog. There are no error messages or any sign of a problem anywhere, other than the really terrible performance. (When not rebuilding, it does light NFS duty. That performance is similarly bad, but has never really mattered.) Similar systems running Solaris put out 10x these numbers claiming 30% busy instead of 90% busy. Does anyone have any suggestions for how I could troubleshoot this further? At this point, I'm kind of at a loss as to where to go from here. My goal is to try to phase out the Solaris machines, but this is kind of a roadblock. Thanks for any advice! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 9-STABLE doesn't boot: can't load 'kernel'
loader.conf was empty and there's no 4k gnops, geli, anything like that. This is a 100% normal install. Although, since you mentioned 4k blocks, I did leave a gap between ada0p1 and ada0p2 to start the root partition on a 4k boundary. (It's an SSD that will almost never be written to once installed, so that might be a bit silly, but it's a habit already.) I decided to try this again without the gap, and that seems to have worked. I made it through install and partitioning and OS updating to 9-STABLE and installing new boot blocks and it seems to have worked. I even got it to work with a ZFS root. Here's the partition table I ended up with: => 34 234441581 ada0 GPT (111G) 34990 1 freebsd-boot (495k) 1024 226051072 2 freebsd-zfs (107G) 2260520968389519 3 freebsd-swap (4.0G) I'm not sure why this would make a difference, but either it does or doing it cleared out whatever else was wrong. This box will be stress tested and rebooted quite a bit in the next few days, so I will report back if it comes unglued. :) Thanks for the suggestion! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
9-STABLE doesn't boot: can't load 'kernel'
After installing 9.1-RELEASE amd64 on a system, it boots up fine. If I then build and install a new 9-STABLE kernel & world, reboots die in the loader with: can't load 'kernel' This is a pretty straightforward system, one drive, not large (128GB SSD). GPT partitioned, gptboot boot code. One UFS root partition to boot from, a swap partition and, the rest for ZFS. (At first I tried to do this system with root-on-ZFS but that also failed, adding "unable to load zpool by guid" or similar before the "can't load 'kernel'" message.) Once this happens, the disk is unbootable. I can start from the install CD and access the disk just fine, but even if I move kernel.old back to kernel, it doesn't boot anymore. Likewise, it doesn't matter if I overwrite the boot code with gptboot & pmbr from the install CD or the new ones from /boot after installworld. The disk looks like: # gpart show => 34 234441581 ada0 GPT (111G) 34 222 1 freebsd-boot (111k) 256 1792 - free - (896k) 2048 8388608 2 freebsd-ufs (4.0G) 8390656 8388608 3 freebsd-swap (4.0G) 16779264 217662351 4 freebsd-zfs (103G) In the loader: BTX loader 1.00 BTX version is 1.02 Consoles: internal video/keyboard BIOS drive C: is disk0 BIOS 621kB/2067924kB available memory FreeBSD/x86 bootstrap loader, Revision 1.1 (root@builder, Mon Apr 15 09:14:38 UTC 2013) can't load 'kernel' Type '?' for a list of commands, 'help' for more detailed help. OK show […] currdev=disk0p2: […] loaddev=disk0p2: […] OK lsdev cd devices: disk devices: disk0: BIOS drive C: pxe devices: OK ls open '/' failed: no such file or directory OK help Verbose help not available, use '?' to list commands So it's getting the boot device right (disk0p2 / ada0p2), but can't see it at all. Does anyone know what might be wrong? Thanks for any advice! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"