> On Mar 19, 2018, at 7:40 AM, Mike Gerdts <mike.ger...@joyent.com> wrote: > > > On Fri, Mar 16, 2018 at 5:38 PM, Richard Elling > <richard.ell...@richardelling.com <mailto:richard.ell...@richardelling.com>> > wrote: >> On Mar 15, 2018, at 9:48 PM, Mike Gerdts <mike.ger...@joyent.com >> <mailto:mike.ger...@joyent.com>> wrote: >> I had started down that route, then convinced myself that it may be there >> for good reason. Can you explain how a refreservation greater than the size >> calculated by zvol_volsize_to_reservation() is useful? I couldn't come up >> with a way (aside from a potential bug in that copies is not always >> accounted for). > > Sure, raidz skip blocks are not accounted for. In part this is logically due > to skip blocks being assigned > at the SPA layer and reservations are at the DSL layer. The pathological > example is raidz2 on 4kn disks > with volblocksize=8k (default). The predicted reservation is 8k per block > (logical) plus 8k parity = 16k, but > the actual allocated space is 24k. The DSL "free" space assumes 16k so it > overestimates the usable space. > Thus you can run out of allocated space in the pool before hitting > refreservation -- a bad thing. > One way to innoculate is to increase refreservation to a value greater than > volsize. > > > Thanks for that lesson. It seems as though it would be better to fix > zvol_volsize_to_reservation() to account for this rather than to leave it to > system operators to know about this and then each independently come up with > the appropriate algorithm to reserve the right amount of space. Is the > algorithm for performing the proper calculation written down somewhere that > is publicly accessible?
Sure, the seminal blog is: https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz <https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz> Each vdev can have a different allocation ratio and raid config, but if you pick the worst case for the pool at import/create then store it in spa_t, then it is almost handy. For human users, they often do not realize the defaults at play here. Also, there are far too many people propagating the "always set ashift=12 virus" so it could be handy to print a warning in zfs(1m). -- richard ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/Te3d593ba00521b6d-Ma355379a37d594b1204a7ecf Delivery options: https://openzfs.topicbox.com/groups