Date: Tue, 24 Nov 2015 21:57:50 -0553.75 From: "William A. Mahaffey III" <w...@hiwaay.net> Message-ID: <56553074.9060...@hiwaay.net>
| 4256EE1 # time dd if=/dev/zero of=/home/testfile bs=16k count=32768 | 32768+0 records in | 32768+0 records out | 536870912 bytes transferred in 22.475 secs (23887471 bytes/sec) | 23.28 real 0.10 user 2.38 sys | 4256EE1 # | | i.e. about 24 MB/s. I think I'd be happy enough with that, maybe it can be improved a little. | When I zero-out parts of these drive to reinitialize | them, I see ~120 MB/s for one drive. Depending upon just how big those "parts" are, that number might be an illusion. You need to be writing at least about as much as you did in the test above to reduce the effects of write behind (caching in the drive) etc. Normally a "zero to reinit" write doesn't need nearly that much (often just a few MB) - writing that much would be just to the drive's cache, and measuring that speed is just measuring DMA rate, and useless for anything. | RAID5 stripes I/O onto the data | drives, so I expect ~4X I/O speed w/ 4 data drives. With various | overheads/inefficiencies, I (think I) expect 350-400 MB/s writes. That's not going to happen. Every raid write (whatever raid level, except 0) requires 2 parallel disc writes (at least) - you need that to get the redundancy that is the R in the name - it can also require reads. For raid 5, you write to the data drive (one of the 4 of them) and to the parity drive - that is, all writes end up having a write to the parity drive, so the upper limit on speed for a contiguous write is that of one drive (a bit less probably, depending upon which controllers are in use, as the data still needs to be transmitted twice, and each controller can only be transferring for one drive at a time .. at least for the kinds of disc controllers in consumer grade equipment.) If both data and parity happen to be using the same controller, the max rate will certainly be less (measurably less, though perhaps not dramatically) than what you can achieve with one drive. If they're on different controllers, then in ideal circumstances you might get close to the rate you can expect from one drive. For general purpose I/O (writes all over the filesystem, as you'd see in normal operations) that's mitigated by there not really being one parity drive, rather, all 5 drives (the 4 you think of as being the data drives, and the one you think of as being the parity drive) perform as both data and parity drives, for different segments of the raid, so there isn't really (in normal operation) a one drive bottleneck -- but with 5 drives, and 2 writes needed for each I/O, the best you could possibly do is twice as fast as a single drive in overall throughput. In practice you'd never see that however, real workloads just aren't going to be just that conveniently spread out in just the right parts of the filesystems, if you ever get faster than a single drive can achieve, I'd be surprised. If you ever even approach what a single drive can achieve, I'd be surprised. Now, in the above (aside from the possible measurement error in your 120MB/s) I've been allowing you to think that's "what a single drive can achieve". It isn't. That's raw I/O onto the drive, and will run at the best possible speed that the drive can handle - everything is optimised for that case, as it is one of the (meaningless) standard benchmarks. For real use, there are also filesystem overheads to consider, your raid test was onto a file, on the raid, not onto the raw raid (though I wouldn't expect that to be all that much faster, certainly not more than about 60MB/s assuming the 120 MB/s is correct). To get a more valid baseline, what you can actually expect to observe, you need to be comparing apples to apples - that is, the one drive test needs to also have a filesystem, and you need to be writing a file to it. To test that, take your hot spare (ie: unused) drive, and build a ffs on it instead of raid (unfortunately, I think you need to reboot to get it out of being a raidframe spare first - as I recall, raidframe has no "stop being a spare" operation ... it should have, but ...). Just stop it being added back as a hot spare (assuming you are actually doing that now, spares don't get autoconfigured.) (Wait till down below to see how to do the raidctl -s to see if the hot spare is actually configured or not, raidctl will tell you, once you run it properly). Then build a ffs on the spare drive (after it is no longer a spare) (you'll need to change the partition type from raid to ffs in the label first - probably using disklabel, though it could be gpt) Setup the ffs with the same parameters as the filesystem on your raid (ie: -b 32768 -f 4096), mount that, copy a bunch of files onto it (make its %'ge full be about the same as whatever is on the raid filesystem you're testing - that's about 37% full from your df output later) and then try a dd like above onto that and see how fast that goes. I can promise you it won't be nearly 120MB/s... The filesystem needs data on it (not just empty) so that the block allocation strategy works about the same - writing to an empty filesystem would make it too simple to always pick the best block for the next write, and so make the speed seem faster than what is reasonable in real life. Once you've done that and obtained the results, you can just unmount the dummy filesystem, change the partition type back to raid in the label, and just add it back as a raidframe spare (no need to reboot or anything to go that direction, and no need to do anything to the drive other than change the partition label type). Once you've done that, you can properly compare the raid filesystem performance, and the single disc filesystem performamce, and unless the raid performance is less than half what you get from the single disc filesystem, then I'd just say "OK, that's good" and be done. If the single disc filesysem is hugely faster than than that 24MB/sec (say 60MB/sec or faster), which I kind of doubt it will be, then perhaps you should look at tuning the raid or filesystem params. Until then, leave it alone. | I posted a variation of this question a while back, w/ larger amount of | I/O, & someone else replied that they tried the same command & saw ~20X | faster I/O than mine reported. There are too many variables that could cause that kind of thing, different drive types, filesystem params, ... A question to ask yourself is just what you plan on doing that is going to need more than 24MB/sec sustained write throughput ? Unless your application is something like video editing, which produces lots of data very quickly (and if it is, raid5 is absolutely not what you should be using ... use raid10 instead .. you'll get less space, but must faster writes (and faster reads)) For most normal software development however, you'll never come close to that, when my systems are ultra busy, I see more like 4-5 MB/sec sustained in overall I/O (in and out combined). Just occasional bursts above that. | ffs data from dumpfs for that FS (RAID5 mounted as /home): | | 4256EE1 # cat dumpfs.OUTPUT.head.txt | file system: /dev/rdk0 | format FFSv2 | endian little-endian | location 65536 (-b 128) That looks good. | bsize 32768 shift 15 mask 0xffff8000 | fsize 4096 shift 12 mask 0xfffff000 And those look to be appropriate. I deleted all the rest, none of it is immediately relevant. | 4256EE1 # raidctl -s dk0 | raidctl: ioctl (RAIDFRAME_GET_INFO) failed: Inappropriate ioctl for device That makes no sense, it isn't supposed to work... | 4256EE1 # raidctl -s raid0a | Components: | /dev/wd0a: optimal | /dev/wd1a: optimal This is a raid1, and obviously isn't what you're talking about. (And normally you wouldn't give "raid0a" there, just "raid0", I'm actually a little surprised that raid0a worked, If you wanted to be more explicit, I'd have expected /dev/raid0d to be the device name it really wants.) There must be a raid1a (or something) that has /home mounted on it, right, try raidctl -s raid1 [but not that, see below, probably raid2, I was replying without reading to the end first ... stupid me!] (and yes, I know it is confusing that "raid1" sometimes means "RAID Level 1" and sometimes means "the second raidframe container (disk like thing)") In commands like raidctl it is always the second though... | 4256EE1 # df -h | Filesystem Size Used Avail %Cap Mounted on | /dev/raid0a 16G 210M 15G 1% / | /dev/raid1a 63G 1.1G 59G 1% /usr | /dev/dk0 3.5T 1.2T 2.1T 37% /home | kernfs 1.0K 1.0K 0B 100% /kern | ptyfs 1.0K 1.0K 0B 100% /dev/pts | procfs 4.0K 4.0K 0B 100% /proc | tmpfs 8.0G 4.0K 8.0G 0% /tmp Oh, I where the confusion comes from, dk0 is a wedge, probably on raid2 Do sysctl hw.disknames that will list all of the "disk type" devices in the system. That will include wd0 ... wd5, raid0, raid1, and (I expect) raid2 (as well as perhaps a bunch of dkN wedge things). You want the raidctl -s output from the raidN that is not raid0 or raid1 You can also look in /var/log/messages (or /var/run/dmesg.boot) and see the boot time message that will tell you where dk0 comes from, or do dkctl dk0 getwedgeinfo which will print something like dk0 at sd0: Passport_EFI dk0: 262144 blocks at 512, type: msdos except in your case the "sd0" will be "raidN", the Passport_EFI will be whatever you called the wedge (its label) when it was created, and the sizes and filesystem types will obviously be different. The "raidN" part is all that matters here. The raidN is what you want for raidctl -s | Because of its size (> 2 TB) it was setup using dkctl & raidframe won't | report anything about it, how can I get that info for you ? Thanks & TIA. See above... Once you have all the relevant numbers, it will probably take a raidframe and filesystem expert to tell you whether your layout is optimal or not. I am neither. kre