Re: svn commit: r304180 - head/sys/ufs/ffs
On Wed, 17 Aug 2016, Konstantin Belousov wrote: On Tue, Aug 16, 2016 at 11:53:55PM +0200, Jilles Tjoelker wrote: Hmm, some people interpret POSIX differently than I do, but I think it is clear that XBD 3.384 Synchronized I/O Data Integrity Completion requires the indirect blocks to be written (because "all file system information required to retrieve the data is successfully transferred"). The Linux man page matches this interpretation except that an fsync of the parent directory may be required. All related directories must be synced. It is almost useless to sync the data if the data cannot be found later using normal file system operations. Not quite useless if the file can be recovered by fsck and saved in lost+found, but the metadata needed to guarantee that that works probably requires a lot of writes anyway. If the whole file is being considered, then any change to indirect blocks is needed to access some data (because the file was extended, a hole was filled or snapshotted data was overwritten). For other overwrites, there is no change to indirect blocks and no conflict with fdatasync's performance objectives. Note that the same argument is applicable to the inode block: the di_db and di_ib pointers to the direct blocks and to root indirect blocks are required to retrieve the written data. I would never have guessed that you left out either. ... Ideally, a changed i_size would also cause the inode to be written, but I don't know how to determine that (there is no i_flag for it). Always writing the inode would defeat the point of fdatasync. My version does a bcmp() of the in-core copy with the disk copy to avoid null changes. This turned out to be not very useful. It was intended mainly to avoid writing null changes for timestamps. By I already avoid most timestamp changes by mounting with -noatime and not having large data. I use only seconds granularity for timestamps. With nanoseconds granularity, any change to a file is guaranteed to change its in-core inode, so the bcmp() wouldn't work. But timestamps are about the only metadata that obviously doesn't need syncing. Especially atimes. POSIX doesn't allow turning off atimes, but it also doesn't require syncing them before the system crashes except probably using FSYNC of metadata. There is an i_flag for critical metadata like i_size. It is IN_MODIFIED. This flag is often mismanaged. Many places set only IN_CHANGE after making a critical change. ufs_chown() is one such place. Ownerships are fairly critical metadata. They are not considered critical enough to sync immediately, but they should be synced by FSYNC of metadata. But the implementation is to set only IN_CHANGE. i_ctime is usually updated much later and IN_MODIFIED is set then to say that the relatively unimportant i_ctime has been modified. This inhibits the optimization of syncing for chown() but not syncing for ctime changes. Which was my reasoning behind omitting the indirect block flushing. There is no practical difference between inode block and indirect blocks for metadata consistency. Ugh. Note that the affected case is only the async mounts (not sync mounts, not any variants of softdeps). I fixed the non-syncing of inodes for the async mount case ~15 years ago in my version, and finally merged the changes to my version of -current a few months OK. It was stupid that ordinary fsync() synced the indirect blocks but not the more important inode. Directory metadata is still not synced by fsync() in the async mount case. I can restore indirect block flushing and wait for them for DATA_ONLY sync, but then I should also add ffs_update() call. Here are my ffs fixes for -current. The are mostly to sprinkle missing DOINGASYNC() checks and finally fix ffs_update(): X Index: ffs/ffs_balloc.c X === X --- ffs/ffs_balloc.c (revision 304069) X +++ ffs/ffs_balloc.c (working copy) X @@ -155,4 +155,6 @@ X if (flags & IO_SYNC) X bwrite(bp); X + else if (DOINGASYNC(vp)) X + bdwrite(bp); X else X bawrite(bp); X @@ -265,12 +267,10 @@ X newb, 0, fs->fs_bsize, 0, bp); X bdwrite(bp); X + } else if ((flags & IO_SYNC) == 0 && DOINGASYNC(vp)) { X + if (bp->b_bufsize == fs->fs_bsize) X + bp->b_flags |= B_CLUSTEROK; X + bdwrite(bp); This restructures the code a little to fix the DOINGASYNC() check. IO_SYNC was not checked. X } else { X - /* X - * Write synchronously so that indirect blocks X - * never point at garbage. X - */ X - if (DOINGASYNC(vp)) X - bdwrite(bp); X -
Re: svn commit: r304180 - head/sys/ufs/ffs
On Tue, Aug 16, 2016 at 11:53:55PM +0200, Jilles Tjoelker wrote: > Hmm, some people interpret POSIX differently than I do, but I think it > is clear that XBD 3.384 Synchronized I/O Data Integrity Completion > requires the indirect blocks to be written (because "all file system > information required to retrieve the data is successfully transferred"). > The Linux man page matches this interpretation except that an fsync of > the parent directory may be required. > > If the whole file is being considered, then any change to indirect > blocks is needed to access some data (because the file was extended, a > hole was filled or snapshotted data was overwritten). For other > overwrites, there is no change to indirect blocks and no conflict with > fdatasync's performance objectives. Note that the same argument is applicable to the inode block: the di_db and di_ib pointers to the direct blocks and to root indirect blocks are required to retrieve the written data. ... > Ideally, a changed i_size would also cause the inode to be written, but > I don't know how to determine that (there is no i_flag for it). Always > writing the inode would defeat the point of fdatasync. Which was my reasoning behind omitting the indirect block flushing. There is no practical difference between inode block and indirect blocks for metadata consistency. Note that the affected case is only the async mounts (not sync mounts, not any variants of softdeps). I can restore indirect block flushing and wait for them for DATA_ONLY sync, but then I should also add ffs_update() call. ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r304180 - head/sys/ufs/ffs
On Mon, Aug 15, 2016 at 07:22:24PM +, Konstantin Belousov wrote: > [snip] > @@ -254,15 +259,23 @@ loop: > if ((bp->b_vflags & BV_SCANNED) != 0) > continue; > bp->b_vflags |= BV_SCANNED; > - /* Flush indirects in order. */ > + /* > + * Flush indirects in order, if requested. > + * > + * Note that if only datasync is requested, we can > + * skip indirect blocks when softupdates are not > + * active. Otherwise we must flush them with data, > + * since dependencies prevent data block writes. > + */ > if (waitfor == MNT_WAIT && bp->b_lblkno <= -NDADDR && > - lbn_level(bp->b_lblkno) >= passes) > + (lbn_level(bp->b_lblkno) >= passes || > + ((flags & DATA_ONLY) != 0 && !DOINGSOFTDEP(vp > continue; > if (bp->b_lblkno > lbn) > panic("ffs_syncvnode: syncing truncated data."); > if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) == 0) { > BO_UNLOCK(bo); > - } else if (wait != 0) { > + } else if (wait) { > if (BUF_LOCK(bp, > LK_EXCLUSIVE | LK_SLEEPFAIL | LK_INTERLOCK, > BO_LOCKPTR(bo)) != 0) { Hmm, some people interpret POSIX differently than I do, but I think it is clear that XBD 3.384 Synchronized I/O Data Integrity Completion requires the indirect blocks to be written (because "all file system information required to retrieve the data is successfully transferred"). The Linux man page matches this interpretation except that an fsync of the parent directory may be required. If the whole file is being considered, then any change to indirect blocks is needed to access some data (because the file was extended, a hole was filled or snapshotted data was overwritten). For other overwrites, there is no change to indirect blocks and no conflict with fdatasync's performance objectives. > @@ -330,31 +343,59 @@ next: >* these will be done with one sync and one async pass. >*/ > if (bo->bo_dirty.bv_cnt > 0) { > - /* Write the inode after sync passes to flush deps. */ > - if (wait && DOINGSOFTDEP(vp) && (flags & NO_INO_UPDT) == 0) { > - BO_UNLOCK(bo); > - ffs_update(vp, 1); > - BO_LOCK(bo); > + if ((flags & DATA_ONLY) == 0) { > + still_dirty = true; > + } else { > + /* > + * For data-only sync, dirty indirect buffers > + * are ignored. > + */ > + still_dirty = false; > + TAILQ_FOREACH(bp, &bo->bo_dirty.bv_hd, b_bobufs) { > + if (bp->b_lblkno > -NDADDR) { > + still_dirty = true; > + break; > + } > + } > } > - /* switch between sync/async. */ > - wait = !wait; > - if (wait == 1 || ++passes < NIADDR + 2) > - goto loop; > + > + if (still_dirty) { > + /* Write the inode after sync passes to flush deps. */ > + if (wait && DOINGSOFTDEP(vp) && > + (flags & NO_INO_UPDT) == 0) { > + BO_UNLOCK(bo); > + ffs_update(vp, 1); > + BO_LOCK(bo); > + } > + /* switch between sync/async. */ > + wait = !wait; > + if (wait || ++passes < NIADDR + 2) > + goto loop; > #ifdef INVARIANTS > - if (!vn_isdisk(vp, NULL)) > - vn_printf(vp, "ffs_fsync: dirty "); > + if (!vn_isdisk(vp, NULL)) > + vn_printf(vp, "ffs_fsync: dirty "); > #endif > + } > } > BO_UNLOCK(bo); > error = 0; > - if ((flags & NO_INO_UPDT) == 0) > - error = ffs_update(vp, 1); > - if (DOINGSUJ(vp)) > - softdep_journal_fsync(VTOI(vp)); > + if ((flags & DATA_ONLY) == 0) { > + if ((flags & NO_INO_UPDT) == 0) > + error = ffs_update(vp, 1); > + if (DOINGSUJ(vp)) > + softdep_journal_fsync(VTOI(vp)); > + } > return (error); > } Ideally, a changed i_size would also cause the inode to be written, but I don't know how to determine that (there is no i_flag for it). Always writing the inode would defeat the point of fdatasync. -- Jilles Tjoelker ___ svn-src-head@freebsd.org maili
svn commit: r304180 - head/sys/ufs/ffs
Author: kib Date: Mon Aug 15 19:22:23 2016 New Revision: 304180 URL: https://svnweb.freebsd.org/changeset/base/304180 Log: Implement VOP_FDATASYNC() for UFS. Reviewed by: mckusick Tested by:pho Sponsored by: The FreeBSD Foundation MFC after:2 weeks Differential revision:https://reviews.freebsd.org/D7471 Modified: head/sys/ufs/ffs/ffs_extern.h head/sys/ufs/ffs/ffs_vnops.c Modified: head/sys/ufs/ffs/ffs_extern.h == --- head/sys/ufs/ffs/ffs_extern.h Mon Aug 15 19:18:10 2016 (r304179) +++ head/sys/ufs/ffs/ffs_extern.h Mon Aug 15 19:22:23 2016 (r304180) @@ -174,6 +174,11 @@ void softdep_freework(struct workhead *) * deadlock when flushing snapshot inodes while holding snaplk. */ #defineNO_INO_UPDT 0x0001 +/* + * Request data sync only from ffs_syncvnode(), not touching even more + * metadata than NO_INO_UPDT. + */ +#defineDATA_ONLY 0x0002 intffs_rdonly(struct inode *); Modified: head/sys/ufs/ffs/ffs_vnops.c == --- head/sys/ufs/ffs/ffs_vnops.cMon Aug 15 19:18:10 2016 (r304179) +++ head/sys/ufs/ffs/ffs_vnops.cMon Aug 15 19:22:23 2016 (r304180) @@ -103,6 +103,7 @@ __FBSDID("$FreeBSD$"); extern int ffs_rawread(struct vnode *vp, struct uio *uio, int *workdone); #endif static vop_fsync_t ffs_fsync; +static vop_fdatasync_t ffs_fdatasync; static vop_lock1_t ffs_lock; static vop_read_t ffs_read; static vop_write_t ffs_write; @@ -123,6 +124,7 @@ static vop_vptofh_t ffs_vptofh; struct vop_vector ffs_vnodeops1 = { .vop_default = &ufs_vnodeops, .vop_fsync =ffs_fsync, + .vop_fdatasync =ffs_fdatasync, .vop_getpages = vnode_pager_local_getpages, .vop_getpages_async = vnode_pager_local_getpages_async, .vop_lock1 =ffs_lock, @@ -135,6 +137,7 @@ struct vop_vector ffs_vnodeops1 = { struct vop_vector ffs_fifoops1 = { .vop_default = &ufs_fifoops, .vop_fsync =ffs_fsync, + .vop_fdatasync =ffs_fdatasync, .vop_reallocblks = ffs_reallocblks, /* XXX: really ??? */ .vop_vptofh = ffs_vptofh, }; @@ -143,6 +146,7 @@ struct vop_vector ffs_fifoops1 = { struct vop_vector ffs_vnodeops2 = { .vop_default = &ufs_vnodeops, .vop_fsync =ffs_fsync, + .vop_fdatasync =ffs_fdatasync, .vop_getpages = vnode_pager_local_getpages, .vop_getpages_async = vnode_pager_local_getpages_async, .vop_lock1 =ffs_lock, @@ -161,6 +165,7 @@ struct vop_vector ffs_vnodeops2 = { struct vop_vector ffs_fifoops2 = { .vop_default = &ufs_fifoops, .vop_fsync =ffs_fsync, + .vop_fdatasync =ffs_fdatasync, .vop_lock1 =ffs_lock, .vop_reallocblks = ffs_reallocblks, .vop_strategy = ffsext_strategy, @@ -216,10 +221,10 @@ ffs_syncvnode(struct vnode *vp, int wait { struct inode *ip; struct bufobj *bo; - struct buf *bp; - struct buf *nbp; + struct buf *bp, *nbp; ufs_lbn_t lbn; - int error, wait, passes; + int error, passes; + bool still_dirty, wait; ip = VTOI(vp); ip->i_flag &= ~IN_NEEDSYNC; @@ -238,7 +243,7 @@ ffs_syncvnode(struct vnode *vp, int wait */ error = 0; passes = 0; - wait = 0; /* Always do an async pass first. */ + wait = false; /* Always do an async pass first. */ lbn = lblkno(ip->i_fs, (ip->i_size + ip->i_fs->fs_bsize - 1)); BO_LOCK(bo); loop: @@ -254,15 +259,23 @@ loop: if ((bp->b_vflags & BV_SCANNED) != 0) continue; bp->b_vflags |= BV_SCANNED; - /* Flush indirects in order. */ + /* +* Flush indirects in order, if requested. +* +* Note that if only datasync is requested, we can +* skip indirect blocks when softupdates are not +* active. Otherwise we must flush them with data, +* since dependencies prevent data block writes. +*/ if (waitfor == MNT_WAIT && bp->b_lblkno <= -NDADDR && - lbn_level(bp->b_lblkno) >= passes) + (lbn_level(bp->b_lblkno) >= passes || + ((flags & DATA_ONLY) != 0 && !DOINGSOFTDEP(vp continue; if (bp->b_lblkno > lbn) panic("ffs_syncvnode: syncing truncated data."); if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) == 0) { BO_UNLOCK(bo); -