> On Mar 19, 2018, at 7:40 AM, Mike Gerdts <mike.ger...@joyent.com> wrote:
> 
> 
> On Fri, Mar 16, 2018 at 5:38 PM, Richard Elling 
> <richard.ell...@richardelling.com <mailto:richard.ell...@richardelling.com>> 
> wrote:
>> On Mar 15, 2018, at 9:48 PM, Mike Gerdts <mike.ger...@joyent.com 
>> <mailto:mike.ger...@joyent.com>> wrote:
>> I had started down that route, then convinced myself that it may be there 
>> for good reason.  Can you explain how a refreservation greater than the size 
>> calculated by zvol_volsize_to_reservation() is useful?  I couldn't come up 
>> with a way (aside from a potential bug in that copies is not always 
>> accounted for).
> 
> Sure, raidz skip blocks are not accounted for. In part this is logically due 
> to skip blocks being assigned
> at the SPA layer and reservations are at the DSL layer. The pathological 
> example is raidz2 on 4kn disks
> with volblocksize=8k (default). The predicted reservation is 8k per block 
> (logical) plus 8k parity = 16k, but
> the actual allocated space is 24k. The DSL "free" space assumes 16k so it 
> overestimates the usable space.
> Thus you can run out of allocated space in the pool before hitting 
> refreservation -- a bad thing.
> One way to innoculate is to increase refreservation to a value greater than 
> volsize. 
> 
> 
> Thanks for that lesson.  It seems as though it would be better to fix 
> zvol_volsize_to_reservation() to account for this rather than to leave it to 
> system operators to know about this and then each independently come up with 
> the appropriate algorithm to reserve the right amount of space.  Is the 
> algorithm for performing the proper calculation written down somewhere that 
> is publicly accessible?

Sure, the seminal blog is: 
https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz
 
<https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz>

Each vdev can have a different allocation ratio and raid config, but if you 
pick the worst case for the pool at 
import/create then store it in spa_t, then it is almost handy.

For human users, they often do not realize the defaults at play here. Also, 
there are far too many people 
propagating the "always set ashift=12 virus" so it could be handy to print a 
warning in zfs(1m).
 -- richard



------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/Te3d593ba00521b6d-Ma355379a37d594b1204a7ecf
Delivery options: https://openzfs.topicbox.com/groups

Reply via email to