Re: [developer] DRAID impressions

Richard Elling Wed, 07 Dec 2022 16:00:21 -0800

draid uses the raidz allocator. Thus it is subject to the same allocation
efficiencies as raidz and
described here
https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz
The short answer is: avoid using raidz with small recordsize or
volblocksize relative to the minimum physically allocatable size of
the disks. Too small is bad and too big is bad: somewhere in the middle is
a reasonable trade-off for performance and space
efficiency.
 -- richard




On Tue, Dec 6, 2022 at 6:59 PM Phil Harman <phil.har...@gmail.com> wrote:

> Hi Richard, a very long time indeed!
>
> I was thinking more of the accumulated allocation (not instantaneous IOPS).
>
> The pool was created with both DRAID vdevs from the get-go. We had 32x
> 12TB and 12x 14TB drives to repurpose, and wanted to build two identical
> pools.
>
> Our other two servers have 16x new 18TB. Interestingly, (and more by luck
> than design) the usable space across all four servers is the same.
>
> In this pool, both vdevs use a 3d+2p RAID scheme. But whereas
> draid2:3d:16c:1s-0 has three 3d+2p, plus one spare (16x
> 12TB), draid2:3d:6c:1s-1 has just one 3d+2p, plus one spare (6x 14TB
> drives).
>
> The disappointing thing is the lop-sided allocation,
> such that draid2:3d:6c:1s-1 has 50.6T allocated and 3.96T free (i.e. 92%
> allocated), whereas raid2:3d:16c:1s-0 only has 128T allocated, and 63.1T
> free (i.e. about 50% allocated).
>
> Phil
>
> On Tue, 6 Dec 2022 at 13:10, Richard Elling <
> richard.ell...@richardelling.com> wrote:
>
>> Hi Phil, long time, no see. comments embedded below
>>
>>
>> On Tue, Dec 6, 2022 at 3:35 AM Phil Harman <phil.har...@gmail.com> wrote:
>>
>>> I have a number of "ZFS backup" servers (about 1PB split between four
>>> machines).
>>>
>>> Some of them have 16x 18TB drives, but a couple have a mix of 12TB and
>>> 14TB drives (because that's what we had).
>>>
>>> All are running Ubuntu 20.04 LTS (with a snapshot build) or 22.04 LTS
>>> (bundled).
>>>
>>> We're just doing our first replace (actually swapping in a 16TB drive
>>> for a 12TB because that's all we have).
>>>
>>> root@hqs1:~# zpool version
>>> zfs-2.1.99-784_gae07fc139
>>> zfs-kmod-2.1.99-784_gae07fc139
>>> root@hqs1:~# zpool status
>>>   pool: hqs1p1
>>>  state: DEGRADED
>>> status: One or more devices is currently being resilvered.  The pool will
>>> continue to function, possibly in a degraded state.
>>> action: Wait for the resilver to complete.
>>>   scan: resilver in progress since Mon Dec  5 15:19:31 2022
>>> 177T scanned at 2.49G/s, 175T issued at 2.46G/s, 177T total
>>> 7.95T resilvered, 98.97% done, 00:12:39 to go
>>> config:
>>>
>>> NAME                                      STATE     READ WRITE CKSUM
>>> hqs1p1                                    DEGRADED     0     0     0
>>>  draid2:3d:16c:1s-0                      DEGRADED     0     0     0
>>>    780530e1-d2e4-0040-aa8b-8c7bed75a14a  ONLINE       0     0     0
>>>  (resilvering)
>>>    9c4428e8-d16f-3849-97d9-22fc441750dc  ONLINE       0     0     0
>>>  (resilvering)
>>>    0e148b1d-69a3-3345-9478-343ecf6b855d  ONLINE       0     0     0
>>>  (resilvering)
>>>    98208ffe-4b31-564f-832d-5744c809f163  ONLINE       0     0     0
>>>  (resilvering)
>>>    3ac46b0a-9c46-e14f-8137-69227f3a890a  ONLINE       0     0     0
>>>  (resilvering)
>>>    44e8f62f-5d49-c345-9c89-ac82926d42b7  ONLINE       0     0     0
>>>  (resilvering)
>>>    968dbacd-1d85-0b40-a1fc-977a09ac5aaa  ONLINE       0     0     0
>>>  (resilvering)
>>>    e7ca2666-1067-f54c-b723-b464fb0a5fa3  ONLINE       0     0     0
>>>  (resilvering)
>>>    318ff075-8860-e84e-8063-f77775f57a2d  ONLINE       0     0     0
>>>  (resilvering)
>>>    replacing-9                           DEGRADED     0     0     0
>>>      2888151727045752617                 UNAVAIL      0     0     0  was
>>> /dev/disk/by-partuuid/85fa9347-8359-4942-a20d-da1f6016ea48
>>>      sdd                                 ONLINE       0     0     0
>>>  (resilvering)
>>>    fd69f284-d05d-f145-9bdb-0da8a72bf311  ONLINE       0     0     0
>>>  (resilvering)
>>>    f40f997a-33a1-2a4e-bb8d-64223c441f0f  ONLINE       0     0     0
>>>  (resilvering)
>>>    dbc35ea9-95d1-bd40-b79e-90d8a37079a6  ONLINE       0     0     0
>>>  (resilvering)
>>>    ac62bf3e-517e-a444-ae4f-a784b81cd14c  ONLINE       0     0     0
>>>  (resilvering)
>>>    d211031c-54d4-2443-853c-7e5c075b28ab  ONLINE       0     0     0
>>>  (resilvering)
>>>    06ba16e5-05cf-9b45-a267-510bfe98ceb1  ONLINE       0     0     0
>>>  (resilvering)
>>>  draid2:3d:6c:1s-1                       ONLINE       0     0     0
>>>    be297802-095c-7d43-9132-360627ba8ceb  ONLINE       0     0     0
>>>    e849981c-7316-cb47-b926-61d444790518  ONLINE       0     0     0
>>>    bbc6d66d-38e1-c448-9d00-10ba7adcd371  ONLINE       0     0     0
>>>    9fb44c95-5ea6-2347-ae97-38de283f45bf  ONLINE       0     0     0
>>>    b212cae5-5068-8740-b120-0618ad459c1f  ONLINE       0     0     0
>>>    8c771f6b-7d48-e744-9e25-847230fd2fdd  ONLINE       0     0     0
>>> spares
>>>  draid2-0-0                              AVAIL
>>>  draid2-1-0                              AVAIL
>>>
>>> errors: No known data errors
>>> root@hqs1:~#
>>>
>>> Here's a snippet of zpool iostat -v 1 ...
>>>
>>>                                             capacity     operations
>>> bandwidth
>>> pool                                      alloc   free   read  write
>>> read  write
>>> ----------------------------------------  -----  -----  -----  -----
>>>  -----  -----
>>> hqs1p1                                     178T  67.1T  5.68K    259
>>> 585M   144M
>>>   draid2:3d:16c:1s-0                       128T  63.1T  5.66K    259
>>> 585M   144M
>>>     780530e1-d2e4-0040-aa8b-8c7bed75a14a      -      -    208      0
>>>  25.4M      0
>>>     9c4428e8-d16f-3849-97d9-22fc441750dc      -      -   1010      2
>>> 102M  23.7K
>>>     0e148b1d-69a3-3345-9478-343ecf6b855d      -      -    145      0
>>>  34.1M  15.8K
>>>     98208ffe-4b31-564f-832d-5744c809f163      -      -    101      1
>>>  28.3M  7.90K
>>>     3ac46b0a-9c46-e14f-8137-69227f3a890a      -      -    511      0
>>>  53.5M      0
>>>     44e8f62f-5d49-c345-9c89-ac82926d42b7      -      -     12      0
>>>  4.82M      0
>>>     968dbacd-1d85-0b40-a1fc-977a09ac5aaa      -      -     22      0
>>>  5.43M  15.8K
>>>     e7ca2666-1067-f54c-b723-b464fb0a5fa3      -      -    227      2
>>>  36.7M  23.7K
>>>     318ff075-8860-e84e-8063-f77775f57a2d      -      -    999      1
>>>  83.1M  7.90K
>>>     replacing-9                               -      -      0    243
>>>  0   144M
>>>       2888151727045752617                     -      -      0      0
>>>  0      0
>>>       sdd                                     -      -      0    243
>>>  0   144M
>>>     fd69f284-d05d-f145-9bdb-0da8a72bf311      -      -    306      0
>>>  54.5M  15.8K
>>>     f40f997a-33a1-2a4e-bb8d-64223c441f0f      -      -     47      0
>>>  16.9M      0
>>>     dbc35ea9-95d1-bd40-b79e-90d8a37079a6      -      -    234      0
>>>  16.7M  15.8K
>>>     ac62bf3e-517e-a444-ae4f-a784b81cd14c      -      -    417      0
>>>  15.9M      0
>>>     d211031c-54d4-2443-853c-7e5c075b28ab      -      -    911      0
>>>  48.4M  15.8K
>>>     06ba16e5-05cf-9b45-a267-510bfe98ceb1      -      -    643      0
>>>  60.1M  15.8K
>>>   draid2:3d:6c:1s-1                       50.6T  3.96T     19      0
>>> 198K      0
>>>     be297802-095c-7d43-9132-360627ba8ceb      -      -      3      0
>>>  39.5K      0
>>>     e849981c-7316-cb47-b926-61d444790518      -      -      2      0
>>>  23.7K      0
>>>     bbc6d66d-38e1-c448-9d00-10ba7adcd371      -      -      2      0
>>>  23.7K      0
>>>     9fb44c95-5ea6-2347-ae97-38de283f45bf      -      -      3      0
>>>  39.5K      0
>>>     b212cae5-5068-8740-b120-0618ad459c1f      -      -      3      0
>>>  39.5K      0
>>>     8c771f6b-7d48-e744-9e25-847230fd2fdd      -      -      1      0
>>>  31.6K      0
>>> ----------------------------------------  -----  -----  -----  -----
>>>  -----  -----
>>>
>>> Lots of DRAID goodness there. Seems to be resilvering a q good whack.
>>>
>>> My main question is: why have the DRAID vdevs been so disproportionately
>>> allocated?
>>>
>>
>> When doing a replace to disk, which is not the same as replace to logical
>> spare, then the
>> disk's write performance is the bottleneck. The draid rebuild time
>> improvements only apply
>> when rebuilding onto a logical spare.
>>
>> The data should be spread about nicely, but I don't think you'll see that
>> with a short time interval.
>> Over perhaps 100 seconds or so, it should look more randomly spread.
>>  -- richard
>>
>>
>>
>>>
>>> *openzfs <https://openzfs.topicbox.com/latest>* / openzfs-developer /
> see discussions <https://openzfs.topicbox.com/groups/developer> +
> participants <https://openzfs.topicbox.com/groups/developer/members> +
> delivery options
> <https://openzfs.topicbox.com/groups/developer/subscription> Permalink
> <https://openzfs.topicbox.com/groups/developer/T386dac0e170785f1-M52e20d7413592e2a95fa2785>
>

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T386dac0e170785f1-Ma7f92321e27f4be33e57e1ea
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Re: [developer] DRAID impressions

Reply via email to