draid uses the raidz allocator. Thus it is subject to the same allocation efficiencies as raidz and described here https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz The short answer is: avoid using raidz with small recordsize or volblocksize relative to the minimum physically allocatable size of the disks. Too small is bad and too big is bad: somewhere in the middle is a reasonable trade-off for performance and space efficiency. -- richard
On Tue, Dec 6, 2022 at 6:59 PM Phil Harman <phil.har...@gmail.com> wrote: > Hi Richard, a very long time indeed! > > I was thinking more of the accumulated allocation (not instantaneous IOPS). > > The pool was created with both DRAID vdevs from the get-go. We had 32x > 12TB and 12x 14TB drives to repurpose, and wanted to build two identical > pools. > > Our other two servers have 16x new 18TB. Interestingly, (and more by luck > than design) the usable space across all four servers is the same. > > In this pool, both vdevs use a 3d+2p RAID scheme. But whereas > draid2:3d:16c:1s-0 has three 3d+2p, plus one spare (16x > 12TB), draid2:3d:6c:1s-1 has just one 3d+2p, plus one spare (6x 14TB > drives). > > The disappointing thing is the lop-sided allocation, > such that draid2:3d:6c:1s-1 has 50.6T allocated and 3.96T free (i.e. 92% > allocated), whereas raid2:3d:16c:1s-0 only has 128T allocated, and 63.1T > free (i.e. about 50% allocated). > > Phil > > On Tue, 6 Dec 2022 at 13:10, Richard Elling < > richard.ell...@richardelling.com> wrote: > >> Hi Phil, long time, no see. comments embedded below >> >> >> On Tue, Dec 6, 2022 at 3:35 AM Phil Harman <phil.har...@gmail.com> wrote: >> >>> I have a number of "ZFS backup" servers (about 1PB split between four >>> machines). >>> >>> Some of them have 16x 18TB drives, but a couple have a mix of 12TB and >>> 14TB drives (because that's what we had). >>> >>> All are running Ubuntu 20.04 LTS (with a snapshot build) or 22.04 LTS >>> (bundled). >>> >>> We're just doing our first replace (actually swapping in a 16TB drive >>> for a 12TB because that's all we have). >>> >>> root@hqs1:~# zpool version >>> zfs-2.1.99-784_gae07fc139 >>> zfs-kmod-2.1.99-784_gae07fc139 >>> root@hqs1:~# zpool status >>> pool: hqs1p1 >>> state: DEGRADED >>> status: One or more devices is currently being resilvered. The pool will >>> continue to function, possibly in a degraded state. >>> action: Wait for the resilver to complete. >>> scan: resilver in progress since Mon Dec 5 15:19:31 2022 >>> 177T scanned at 2.49G/s, 175T issued at 2.46G/s, 177T total >>> 7.95T resilvered, 98.97% done, 00:12:39 to go >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> hqs1p1 DEGRADED 0 0 0 >>> draid2:3d:16c:1s-0 DEGRADED 0 0 0 >>> 780530e1-d2e4-0040-aa8b-8c7bed75a14a ONLINE 0 0 0 >>> (resilvering) >>> 9c4428e8-d16f-3849-97d9-22fc441750dc ONLINE 0 0 0 >>> (resilvering) >>> 0e148b1d-69a3-3345-9478-343ecf6b855d ONLINE 0 0 0 >>> (resilvering) >>> 98208ffe-4b31-564f-832d-5744c809f163 ONLINE 0 0 0 >>> (resilvering) >>> 3ac46b0a-9c46-e14f-8137-69227f3a890a ONLINE 0 0 0 >>> (resilvering) >>> 44e8f62f-5d49-c345-9c89-ac82926d42b7 ONLINE 0 0 0 >>> (resilvering) >>> 968dbacd-1d85-0b40-a1fc-977a09ac5aaa ONLINE 0 0 0 >>> (resilvering) >>> e7ca2666-1067-f54c-b723-b464fb0a5fa3 ONLINE 0 0 0 >>> (resilvering) >>> 318ff075-8860-e84e-8063-f77775f57a2d ONLINE 0 0 0 >>> (resilvering) >>> replacing-9 DEGRADED 0 0 0 >>> 2888151727045752617 UNAVAIL 0 0 0 was >>> /dev/disk/by-partuuid/85fa9347-8359-4942-a20d-da1f6016ea48 >>> sdd ONLINE 0 0 0 >>> (resilvering) >>> fd69f284-d05d-f145-9bdb-0da8a72bf311 ONLINE 0 0 0 >>> (resilvering) >>> f40f997a-33a1-2a4e-bb8d-64223c441f0f ONLINE 0 0 0 >>> (resilvering) >>> dbc35ea9-95d1-bd40-b79e-90d8a37079a6 ONLINE 0 0 0 >>> (resilvering) >>> ac62bf3e-517e-a444-ae4f-a784b81cd14c ONLINE 0 0 0 >>> (resilvering) >>> d211031c-54d4-2443-853c-7e5c075b28ab ONLINE 0 0 0 >>> (resilvering) >>> 06ba16e5-05cf-9b45-a267-510bfe98ceb1 ONLINE 0 0 0 >>> (resilvering) >>> draid2:3d:6c:1s-1 ONLINE 0 0 0 >>> be297802-095c-7d43-9132-360627ba8ceb ONLINE 0 0 0 >>> e849981c-7316-cb47-b926-61d444790518 ONLINE 0 0 0 >>> bbc6d66d-38e1-c448-9d00-10ba7adcd371 ONLINE 0 0 0 >>> 9fb44c95-5ea6-2347-ae97-38de283f45bf ONLINE 0 0 0 >>> b212cae5-5068-8740-b120-0618ad459c1f ONLINE 0 0 0 >>> 8c771f6b-7d48-e744-9e25-847230fd2fdd ONLINE 0 0 0 >>> spares >>> draid2-0-0 AVAIL >>> draid2-1-0 AVAIL >>> >>> errors: No known data errors >>> root@hqs1:~# >>> >>> Here's a snippet of zpool iostat -v 1 ... >>> >>> capacity operations >>> bandwidth >>> pool alloc free read write >>> read write >>> ---------------------------------------- ----- ----- ----- ----- >>> ----- ----- >>> hqs1p1 178T 67.1T 5.68K 259 >>> 585M 144M >>> draid2:3d:16c:1s-0 128T 63.1T 5.66K 259 >>> 585M 144M >>> 780530e1-d2e4-0040-aa8b-8c7bed75a14a - - 208 0 >>> 25.4M 0 >>> 9c4428e8-d16f-3849-97d9-22fc441750dc - - 1010 2 >>> 102M 23.7K >>> 0e148b1d-69a3-3345-9478-343ecf6b855d - - 145 0 >>> 34.1M 15.8K >>> 98208ffe-4b31-564f-832d-5744c809f163 - - 101 1 >>> 28.3M 7.90K >>> 3ac46b0a-9c46-e14f-8137-69227f3a890a - - 511 0 >>> 53.5M 0 >>> 44e8f62f-5d49-c345-9c89-ac82926d42b7 - - 12 0 >>> 4.82M 0 >>> 968dbacd-1d85-0b40-a1fc-977a09ac5aaa - - 22 0 >>> 5.43M 15.8K >>> e7ca2666-1067-f54c-b723-b464fb0a5fa3 - - 227 2 >>> 36.7M 23.7K >>> 318ff075-8860-e84e-8063-f77775f57a2d - - 999 1 >>> 83.1M 7.90K >>> replacing-9 - - 0 243 >>> 0 144M >>> 2888151727045752617 - - 0 0 >>> 0 0 >>> sdd - - 0 243 >>> 0 144M >>> fd69f284-d05d-f145-9bdb-0da8a72bf311 - - 306 0 >>> 54.5M 15.8K >>> f40f997a-33a1-2a4e-bb8d-64223c441f0f - - 47 0 >>> 16.9M 0 >>> dbc35ea9-95d1-bd40-b79e-90d8a37079a6 - - 234 0 >>> 16.7M 15.8K >>> ac62bf3e-517e-a444-ae4f-a784b81cd14c - - 417 0 >>> 15.9M 0 >>> d211031c-54d4-2443-853c-7e5c075b28ab - - 911 0 >>> 48.4M 15.8K >>> 06ba16e5-05cf-9b45-a267-510bfe98ceb1 - - 643 0 >>> 60.1M 15.8K >>> draid2:3d:6c:1s-1 50.6T 3.96T 19 0 >>> 198K 0 >>> be297802-095c-7d43-9132-360627ba8ceb - - 3 0 >>> 39.5K 0 >>> e849981c-7316-cb47-b926-61d444790518 - - 2 0 >>> 23.7K 0 >>> bbc6d66d-38e1-c448-9d00-10ba7adcd371 - - 2 0 >>> 23.7K 0 >>> 9fb44c95-5ea6-2347-ae97-38de283f45bf - - 3 0 >>> 39.5K 0 >>> b212cae5-5068-8740-b120-0618ad459c1f - - 3 0 >>> 39.5K 0 >>> 8c771f6b-7d48-e744-9e25-847230fd2fdd - - 1 0 >>> 31.6K 0 >>> ---------------------------------------- ----- ----- ----- ----- >>> ----- ----- >>> >>> Lots of DRAID goodness there. Seems to be resilvering a q good whack. >>> >>> My main question is: why have the DRAID vdevs been so disproportionately >>> allocated? >>> >> >> When doing a replace to disk, which is not the same as replace to logical >> spare, then the >> disk's write performance is the bottleneck. The draid rebuild time >> improvements only apply >> when rebuilding onto a logical spare. >> >> The data should be spread about nicely, but I don't think you'll see that >> with a short time interval. >> Over perhaps 100 seconds or so, it should look more randomly spread. >> -- richard >> >> >> >>> >>> *openzfs <https://openzfs.topicbox.com/latest>* / openzfs-developer / > see discussions <https://openzfs.topicbox.com/groups/developer> + > participants <https://openzfs.topicbox.com/groups/developer/members> + > delivery options > <https://openzfs.topicbox.com/groups/developer/subscription> Permalink > <https://openzfs.topicbox.com/groups/developer/T386dac0e170785f1-M52e20d7413592e2a95fa2785> > ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/T386dac0e170785f1-Ma7f92321e27f4be33e57e1ea Delivery options: https://openzfs.topicbox.com/groups/developer/subscription