-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/163/#review530
-----------------------------------------------------------

Ship it!


Ship It!

- George Wilson


On Feb. 21, 2015, 5:34 a.m., Justin Gibbs wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.csiden.org/r/163/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2015, 5:34 a.m.)
> 
> 
> Review request for OpenZFS Developer Mailing List and Christopher Siden.
> 
> 
> Bugs: 5630
>     https://www.illumos.org/projects/illumos-gate//issues/5630
> 
> 
> Repository: illumos-gate
> 
> 
> Description
> -------
> 
> Fix data corruption in ZFS triggered by rapidly destroying
> and creating datasets while listing datasets in another
> context.
> 
> dnode_sync_free() resets the in-core dnode_t of a just
> deleted object so it represents an unallocated object.  The
> dnode_t will only be considered to represent a free dnode
> after its last hold is released.  This last hold may be
> dropped after dnode_sync_free() returns.  For data dbufs
> holding the dnode, this delay can be due to asynchronous
> evictions from the arc coincident with the dnode being
> destroyed.  For bonus buffers, this can happen if the object
> type can be destroyed even when held in another context.
> 
> The observed data corruption occurred when a dsl_dir was
> destroyed for a recursive dataset destroy, while a "zfs
> list" operation also held this dsl_dir.  Although
> dnode_sync_free() calls dnode_evict_dbufs(), the hold on
> the dsl_dir's bonus buffer prevented it from being evicted.
> This left the bonus buffer associated with the dnode_t even
> after the last hold on the dnode was released.
> 
> Some time later, this dnode_t was reused for a new dsl_dir.
> Instead of getting a new (and zero filled) bonus buffer,
> the contents from the old dsl_dir were returned.  The dsl_dir
> initialization code assumes the bonus buffer is zero filled
> and so only explicitly initializes fields that have non-zero
> initial values.  This allowed the stale data to be incorporated
> into the new dsl_dir and written to the media.
> 
> Bonus buffers are only evicted via dnode_evict_dbufs() when
> there are no holds on the bonus buffer, and via
> dnode_buf_pageout/dnode_destroy().  The fix employed here
> evicts the bonus buffer for a freed dnode when the bonus
> buffer's last hold is dropped, but before the bonus buffer's
> dnode hold is released.  This ensures the dnode_t can only
> be reused after the bonus buffer eviction completes.
> 
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h:
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
>       Add dnode_rele_and_unlock(), which avoids an extra
>       lock drop and acquisition in contexts where the
>       dn_mtx is already held.
> 
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
>       In dbuf_rele_and_unlock(), immediately evict bonus
>       buffers associated with freed dnodes.  The data in
>       the bonus buffer is no longer relevant and this
>       prevents a stale bonus buffer from being associated
>       with the dnode_t should the dnode_t be reused prior
>       to being destroyed.
>     
>     In dbuf_rele_and_unlock(), hold the bonus buffer's mutex
>     until after DB_DNODE_EXIT().  This prevents another
>     thread (e.g. dmu_objset_evict()->dnode_evict_dbufs()),
>     from evicting the bonus buffer and invalidating the
>     dbuf's dnh before or during the DB_DNODE_ENTER/EXIT()
>     calls.
> 
> 
> Diffs
> -----
> 
>   usr/src/uts/common/fs/zfs/dnode.c 4e376fc4d920fcc3f3429a359ed394952d116b42 
>   usr/src/uts/common/fs/zfs/dbuf.c c735c9694ff905586a26d33d6ffd36cfa41ce4cb 
>   usr/src/uts/common/fs/zfs/sys/dnode.h 
> 406954a65323b8de8aa28b07073f588bc1805062 
> 
> Diff: https://reviews.csiden.org/r/163/diff/
> 
> 
> Testing
> -------
> 
> ZFS test suite on FreeBSD. ztest on illumos.
> 
> Continuous loop on FreeBSD of recursively receiving and destroying a tree of 
> 18 file systems while running management code that responds to ZFS 
> create/destroy events and enumerates file systems.  This test would fail 
> within 1 hour prior to this fix.  It has run now for 2.5 days without failure.
> 
> Additional stress test:
> Thread 1:
> ```
> while true; do
>     zpool create foo da1
>     zpool export foo
>     zpool import foo
>     zpool destroy foo
> done
> ```
> 
> Thread 2:
> ```
> while true; do
>     zpool status foo >& /dev/null
> done
> ```
> 
> 
> Thanks,
> 
> Justin Gibbs
> 
>

_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to