Re: [PATCH v3] nilfs2: add a tracepoint for tracking stage transition of segment construction
On Sun, 14 Sep 2014 00:14:36 +0900, Mitake Hitoshi wrote: > This patch adds a tracepoint for tracking stage transition of block > collection in segment construction. With the tracepoint, we can > analysis the behavior of segment construction in depth. It would be > useful for bottleneck detection and debugging, etc. > > The tracepoint is created with the standard trace API of linux (like > ext3, ext4, f2fs and btrfs). So we can analysis with existing tools > easily. Of course, more detailed analysis will be possible if we can > create nilfs specific analysis tools. > > Below is an example of event dump with Brendan Gregg's perf-tools > (https://github.com/brendangregg/perf-tools). Time consumption between > each stage can be obtained. > > $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition > Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end. > segctord-14875 [003] ...1 28311.067794: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_INIT > segctord-14875 [003] ...1 28311.068139: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_GC > segctord-14875 [003] ...1 28311.068139: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_FILE > segctord-14875 [003] ...1 28311.068486: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_IFILE > segctord-14875 [003] ...1 28311.068540: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_CPFILE > segctord-14875 [003] ...1 28311.068561: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SUFILE > segctord-14875 [003] ...1 28311.068565: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DAT > segctord-14875 [003] ...1 28311.068573: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SR > segctord-14875 [003] ...1 28311.068574: > nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DONE > > For capturing transition correctly, this patch adds wrappers for the > member scnt of nilfs_cstage. With this change, every transition of the > stage can produce trace event in a correct manner. > > Of course the tracepoint added by this patch is very limited, so we > need to add more points for detailed analysis. This patch is something > like demonstration. If this concept is acceptable for the nilfs > community, I'd like to add more tracepoints and prepare analysis > tools. > > Signed-off-by: Hitoshi Mitake > --- > fs/nilfs2/segment.c | 71 > +++ > fs/nilfs2/segment.h | 3 +- > include/trace/events/nilfs2.h | 50 ++ > 3 files changed, 103 insertions(+), 21 deletions(-) > create mode 100644 include/trace/events/nilfs2.h > > v3: undo rename > > v2: correct the email address of author > Looks good. I pushed out this patch as "tracepoints" branch of nilfs2.git[1]. If you hope it to be sent to upstream separately at this time, please let me know. (In that case, the following description should be revised to be ready for mainline merge). > Of course the tracepoint added by this patch is very limited, so we > need to add more points for detailed analysis. This patch is something > like demonstration. If this concept is acceptable for the nilfs > community, I'd like to add more tracepoints and prepare analysis > tools. Otherwise, I'll keep it in the tracepoints branch for now. [1] https://github.com/konis/nilfs2.git Thanks, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
On Sat, 13 Sep 2014 16:37:53 +0200, Andreas Rohner wrote: > Under normal circumstances nilfs_sync_fs() writes out the super block, > which causes a flush of the underlying block device. But this depends on > the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the > last segment crosses a segment boundary. So if only a small amount of > data is written before the call to nilfs_sync_fs(), no flush of the > block device occurs. > > In the above case an additional call to blkdev_issue_flush() is needed. > To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device > is introduced, which is cleared whenever new logs are written and set > whenever the block device is flushed. For convenience the function > nilfs_flush_device() is added, which contains the above logic. > > Signed-off-by: Andreas Rohner Applied, thank you! Ryusuke Konishi > --- > fs/nilfs2/file.c | 8 +++- > fs/nilfs2/ioctl.c | 8 +++- > fs/nilfs2/segment.c | 3 +++ > fs/nilfs2/super.c | 6 ++ > fs/nilfs2/the_nilfs.h | 22 ++ > 5 files changed, 37 insertions(+), 10 deletions(-) > > diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c > index 2497815..e9e3325 100644 > --- a/fs/nilfs2/file.c > +++ b/fs/nilfs2/file.c > @@ -56,11 +56,9 @@ int nilfs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > mutex_unlock(&inode->i_mutex); > > nilfs = inode->i_sb->s_fs_info; > - if (!err && nilfs_test_opt(nilfs, BARRIER)) { > - err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > - if (err != -EIO) > - err = 0; > - } > + if (!err) > + err = nilfs_flush_device(nilfs); > + > return err; > } > > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c > index 422fb54..9a20e51 100644 > --- a/fs/nilfs2/ioctl.c > +++ b/fs/nilfs2/ioctl.c > @@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, > struct file *filp, > return ret; > > nilfs = inode->i_sb->s_fs_info; > - if (nilfs_test_opt(nilfs, BARRIER)) { > - ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > - if (ret == -EIO) > - return ret; > - } > + ret = nilfs_flush_device(nilfs); > + if (ret < 0) > + return ret; > > if (argp != NULL) { > down_read(&nilfs->ns_segctor_sem); > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > index a1a1916..0b7d2ca 100644 > --- a/fs/nilfs2/segment.c > +++ b/fs/nilfs2/segment.c > @@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct > nilfs_sc_info *sci) > nilfs_set_next_segment(nilfs, segbuf); > > if (update_sr) { > + nilfs->ns_flushed_device = 0; > nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start, > segbuf->sb_sum.seg_seq, nilfs->ns_cno++); > > @@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block > *sb, struct inode *inode, > sci->sc_dsync_end = end; > > err = nilfs_segctor_do_construct(sci, SC_LSEG_DSYNC); > + if (!err) > + nilfs->ns_flushed_device = 0; > > nilfs_transaction_unlock(sb); > return err; > diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c > index 228f5bd..2e5b3ec 100644 > --- a/fs/nilfs2/super.c > +++ b/fs/nilfs2/super.c > @@ -310,6 +310,9 @@ int nilfs_commit_super(struct super_block *sb, int flag) > nilfs->ns_sbsize)); > } > clear_nilfs_sb_dirty(nilfs); > + nilfs->ns_flushed_device = 1; > + /* make sure store to ns_flushed_device cannot be reordered */ > + smp_wmb(); > return nilfs_sync_super(sb, flag); > } > > @@ -514,6 +517,9 @@ static int nilfs_sync_fs(struct super_block *sb, int wait) > } > up_write(&nilfs->ns_sem); > > + if (!err) > + err = nilfs_flush_device(nilfs); > + > return err; > } > > diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h > index d01ead1..23778d3 100644 > --- a/fs/nilfs2/the_nilfs.h > +++ b/fs/nilfs2/the_nilfs.h > @@ -46,6 +46,7 @@ enum { > /** > * struct the_nilfs - struct to supervise multiple nilfs mount points > * @ns_flags: flags > + * @ns_flushed_device: flag indicating if all volatile data was flushed > * @ns_bdev: block device > * @ns_sem: semaphore for shared states > * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts > @@ -103,6 +104,7 @@ enum { > */ > struct the_nilfs { > unsigned long ns_flags; > + int ns_flushed_device; > > struct block_device*ns_bdev; > struct rw_semaphore ns_sem; > @@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct > the_nilfs *nilfs, __u64 n) > return n == nilfs->ns_segnum || n == nilfs->ns_nextnum; > } > > +static inline int nilfs_flush_device(struct the_nilfs *
[PATCH v3] nilfs2: add a tracepoint for tracking stage transition of segment construction
This patch adds a tracepoint for tracking stage transition of block collection in segment construction. With the tracepoint, we can analysis the behavior of segment construction in depth. It would be useful for bottleneck detection and debugging, etc. The tracepoint is created with the standard trace API of linux (like ext3, ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of course, more detailed analysis will be possible if we can create nilfs specific analysis tools. Below is an example of event dump with Brendan Gregg's perf-tools (https://github.com/brendangregg/perf-tools). Time consumption between each stage can be obtained. $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end. segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_INIT segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_GC segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_FILE segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_IFILE segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_CPFILE segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SUFILE segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DAT segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SR segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DONE For capturing transition correctly, this patch adds wrappers for the member scnt of nilfs_cstage. With this change, every transition of the stage can produce trace event in a correct manner. Of course the tracepoint added by this patch is very limited, so we need to add more points for detailed analysis. This patch is something like demonstration. If this concept is acceptable for the nilfs community, I'd like to add more tracepoints and prepare analysis tools. Signed-off-by: Hitoshi Mitake --- fs/nilfs2/segment.c | 71 +++ fs/nilfs2/segment.h | 3 +- include/trace/events/nilfs2.h | 50 ++ 3 files changed, 103 insertions(+), 21 deletions(-) create mode 100644 include/trace/events/nilfs2.h v3: undo rename v2: correct the email address of author diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index a1a1916..0fcf8e7 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -76,6 +76,36 @@ enum { NILFS_ST_DONE, }; +#define CREATE_TRACE_POINTS +#include + +/* + * nilfs_sc_cstage_inc(), nilfs_sc_cstage_set(), nilfs_sc_cstage_get() are + * wrapper functions of stage count (nilfs_sc_info->sc_stage.scnt). Users of + * the variable must use them because transition of stage count must involve + * trace events (trace_nilfs2_collection_stage_transition). + * + * nilfs_sc_cstage_get() isn't required for the above purpose because it doesn't + * produce tracepoint events. It is provided just for making the intention + * clear. + */ +static inline void nilfs_sc_cstage_inc(struct nilfs_sc_info *sci) +{ + sci->sc_stage.scnt++; + trace_nilfs2_collection_stage_transition(sci); +} + +static inline void nilfs_sc_cstage_set(struct nilfs_sc_info *sci, int next_scnt) +{ + sci->sc_stage.scnt = next_scnt; + trace_nilfs2_collection_stage_transition(sci); +} + +static inline int nilfs_sc_cstage_get(struct nilfs_sc_info *sci) +{ + return sci->sc_stage.scnt; +} + /* State flags of collection */ #define NILFS_CF_NODE 0x0001 /* Collecting node blocks */ #define NILFS_CF_IFILE_STARTED 0x0002 /* IFILE stage has started */ @@ -1055,7 +1085,7 @@ static int nilfs_segctor_collect_blocks(struct nilfs_sc_info *sci, int mode) size_t ndone; int err = 0; - switch (sci->sc_stage.scnt) { + switch (nilfs_sc_cstage_get(sci)) { case NILFS_ST_INIT: /* Pre-processes */ sci->sc_stage.flags = 0; @@ -1064,7 +1094,7 @@ static int nilfs_segctor_collect_blocks(struct nilfs_sc_info *sci, int mode) sci->sc_nblk_inc = 0; sci->sc_curseg->sb_sum.flags = NILFS_SS_LOGBGN; if (mode == SC_LSEG_DSYNC) { - sci->sc_stage.scnt = NILFS_ST_DSYNC; + nilfs_sc_cstage_set(sci, NILFS_ST_DSYNC); goto dsync_mode; } } @@ -1072,10 +1102,10 @@ static int nilfs_segctor_collect
Re: [PATCH v2] nilfs2: add a tracepoint for tracking stage transition of segment construction
On Fri, Sep 5, 2014 at 3:40 PM, Vyacheslav Dubeyko wrote: > On Fri, 2014-09-05 at 11:41 +0900, Hitoshi Mitake wrote: > > [snip] >> diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h >> index 38a1d00..1e9b931 100644 >> --- a/fs/nilfs2/segment.h >> +++ b/fs/nilfs2/segment.h >> @@ -66,13 +66,14 @@ struct nilfs_recovery_info { >> >> /** >> * struct nilfs_cstage - Context of collection stage >> - * @scnt: Stage count >> + * @__scnt: Stage count, must be accessed via wrappers: >> + * nilfs_sc_cstage_inc(), nilfs_sc_cstage_set(), >> nilfs_sc_cstage_get() >> * @flags: State flags >> * @dirty_file_ptr: Pointer on dirty_files list, or inode of a target file >> * @gc_inode_ptr: Pointer on the list of gc-inodes >> */ >> struct nilfs_cstage { >> - int scnt; >> + int __scnt; > > It doesn't make sense this change for my taste. As far as I can judge, > you've made this change because scnt field should be accessed by means > of wrappers. But __scnt name doesn't provide from direct using this > field. So, I suppose that extended comment will be enough. OK, I'll rename it in v3. > > By the way, do you plan to add tracepoints for another NILFS2 > subsystems? Yes. Thanks, Hitoshi > > Thanks, > Vyacheslav Dubeyko. > > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
Under normal circumstances nilfs_sync_fs() writes out the super block, which causes a flush of the underlying block device. But this depends on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the last segment crosses a segment boundary. So if only a small amount of data is written before the call to nilfs_sync_fs(), no flush of the block device occurs. In the above case an additional call to blkdev_issue_flush() is needed. To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device is introduced, which is cleared whenever new logs are written and set whenever the block device is flushed. For convenience the function nilfs_flush_device() is added, which contains the above logic. Signed-off-by: Andreas Rohner --- fs/nilfs2/file.c | 8 +++- fs/nilfs2/ioctl.c | 8 +++- fs/nilfs2/segment.c | 3 +++ fs/nilfs2/super.c | 6 ++ fs/nilfs2/the_nilfs.h | 22 ++ 5 files changed, 37 insertions(+), 10 deletions(-) diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c index 2497815..e9e3325 100644 --- a/fs/nilfs2/file.c +++ b/fs/nilfs2/file.c @@ -56,11 +56,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) mutex_unlock(&inode->i_mutex); nilfs = inode->i_sb->s_fs_info; - if (!err && nilfs_test_opt(nilfs, BARRIER)) { - err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); - if (err != -EIO) - err = 0; - } + if (!err) + err = nilfs_flush_device(nilfs); + return err; } diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 422fb54..9a20e51 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, struct file *filp, return ret; nilfs = inode->i_sb->s_fs_info; - if (nilfs_test_opt(nilfs, BARRIER)) { - ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); - if (ret == -EIO) - return ret; - } + ret = nilfs_flush_device(nilfs); + if (ret < 0) + return ret; if (argp != NULL) { down_read(&nilfs->ns_segctor_sem); diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index a1a1916..0b7d2ca 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) nilfs_set_next_segment(nilfs, segbuf); if (update_sr) { + nilfs->ns_flushed_device = 0; nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start, segbuf->sb_sum.seg_seq, nilfs->ns_cno++); @@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode, sci->sc_dsync_end = end; err = nilfs_segctor_do_construct(sci, SC_LSEG_DSYNC); + if (!err) + nilfs->ns_flushed_device = 0; nilfs_transaction_unlock(sb); return err; diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 228f5bd..2e5b3ec 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -310,6 +310,9 @@ int nilfs_commit_super(struct super_block *sb, int flag) nilfs->ns_sbsize)); } clear_nilfs_sb_dirty(nilfs); + nilfs->ns_flushed_device = 1; + /* make sure store to ns_flushed_device cannot be reordered */ + smp_wmb(); return nilfs_sync_super(sb, flag); } @@ -514,6 +517,9 @@ static int nilfs_sync_fs(struct super_block *sb, int wait) } up_write(&nilfs->ns_sem); + if (!err) + err = nilfs_flush_device(nilfs); + return err; } diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index d01ead1..23778d3 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -46,6 +46,7 @@ enum { /** * struct the_nilfs - struct to supervise multiple nilfs mount points * @ns_flags: flags + * @ns_flushed_device: flag indicating if all volatile data was flushed * @ns_bdev: block device * @ns_sem: semaphore for shared states * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts @@ -103,6 +104,7 @@ enum { */ struct the_nilfs { unsigned long ns_flags; + int ns_flushed_device; struct block_device*ns_bdev; struct rw_semaphore ns_sem; @@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct the_nilfs *nilfs, __u64 n) return n == nilfs->ns_segnum || n == nilfs->ns_nextnum; } +static inline int nilfs_flush_device(struct the_nilfs *nilfs) +{ + int err; + + if (!nilfs_test_opt(nilfs, BARRIER) || nilfs->ns_flushed_device) + return 0; + + nilfs->ns_flushed_device = 1; + /* +* the store to ns_flushed_device must not be reordered after +* blkdev_
[PATCH v6 0/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
Hi, I have looked a bit more into the semantics of the various flags concerning block device caching behaviour. According to "Documentation/block/writeback_cache_control.txt" a call to blkdev_issue_flush() is equivalent to an empty bio with the REQ_FLUSH flag set. So there is no need to call blkdev_issue_flush() after a call to nilfs_commit_super(). But if there is no need to write the super block an additional call to blkdev_issue_flush() is necessary. To avoid an overhead I introduced the nilfs->ns_flushed_device flag, which is set to 0 whenever new logs are written and set to 1 whenever the block device is flushed. If the super block was written during segment construction or in nilfs_sync_fs(), then blkdev_issue_flush() is not called. br, Andreas Rohner v5->v6 (review by Ryusuke Konishi) * Remove special handling of EIO error state from nilfs_ioctl_sync() v4->v5 (review by Ryusuke Konishi) * Move device flushing logic into separate function * Fix invalid comment * Move clearing of the flag to nilfs_segctor_complete_write() and nilfs_construct_dsync_segment() v3->v4 (review by Ryusuke Konishi) * Replace atomic_t with int for ns_flushed_device * Use smp_wmb() to guarantee correct ordering v2->v3 (review of Ryusuke Konishi) * Use separate atomic flag for ns_flushed_device instead of a bit flag in ns_flags * Use smp_mb__after_atomic() after setting ns_flushed_device v1->v2 * Add new flag THE_NILFS_FLUSHED Andreas Rohner (1): nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() fs/nilfs2/file.c | 8 +++- fs/nilfs2/ioctl.c | 8 +++- fs/nilfs2/segment.c | 3 +++ fs/nilfs2/super.c | 6 ++ fs/nilfs2/the_nilfs.h | 22 ++ 5 files changed, 37 insertions(+), 10 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
On Sat, 13 Sep 2014 15:55:56 +0200, Andreas Rohner wrote: > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c > index 422fb54..5a530f3 100644 > --- a/fs/nilfs2/ioctl.c > +++ b/fs/nilfs2/ioctl.c > @@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, > struct file *filp, > return ret; > > nilfs = inode->i_sb->s_fs_info; > - if (nilfs_test_opt(nilfs, BARRIER)) { > - ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > - if (ret == -EIO) > - return ret; > - } > + ret = nilfs_flush_device(nilfs); > + if (ret == -EIO) > + return ret; One more comment. I think this special treatment of EIO should be encapsulated in nilfs_flush_device(). nilfs_ioctl_sync() doesn't have to know it: if (ret < 0) return ret; Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
Under normal circumstances nilfs_sync_fs() writes out the super block, which causes a flush of the underlying block device. But this depends on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the last segment crosses a segment boundary. So if only a small amount of data is written before the call to nilfs_sync_fs(), no flush of the block device occurs. In the above case an additional call to blkdev_issue_flush() is needed. To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device is introduced, which is cleared whenever new logs are written and set whenever the block device is flushed. For convenience the function nilfs_flush_device() is added, which contains the above logic. Signed-off-by: Andreas Rohner --- fs/nilfs2/file.c | 8 +++- fs/nilfs2/ioctl.c | 8 +++- fs/nilfs2/segment.c | 3 +++ fs/nilfs2/super.c | 6 ++ fs/nilfs2/the_nilfs.h | 22 ++ 5 files changed, 37 insertions(+), 10 deletions(-) diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c index 2497815..e9e3325 100644 --- a/fs/nilfs2/file.c +++ b/fs/nilfs2/file.c @@ -56,11 +56,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) mutex_unlock(&inode->i_mutex); nilfs = inode->i_sb->s_fs_info; - if (!err && nilfs_test_opt(nilfs, BARRIER)) { - err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); - if (err != -EIO) - err = 0; - } + if (!err) + err = nilfs_flush_device(nilfs); + return err; } diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 422fb54..5a530f3 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, struct file *filp, return ret; nilfs = inode->i_sb->s_fs_info; - if (nilfs_test_opt(nilfs, BARRIER)) { - ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); - if (ret == -EIO) - return ret; - } + ret = nilfs_flush_device(nilfs); + if (ret == -EIO) + return ret; if (argp != NULL) { down_read(&nilfs->ns_segctor_sem); diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index a1a1916..0b7d2ca 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) nilfs_set_next_segment(nilfs, segbuf); if (update_sr) { + nilfs->ns_flushed_device = 0; nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start, segbuf->sb_sum.seg_seq, nilfs->ns_cno++); @@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode, sci->sc_dsync_end = end; err = nilfs_segctor_do_construct(sci, SC_LSEG_DSYNC); + if (!err) + nilfs->ns_flushed_device = 0; nilfs_transaction_unlock(sb); return err; diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 228f5bd..2e5b3ec 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -310,6 +310,9 @@ int nilfs_commit_super(struct super_block *sb, int flag) nilfs->ns_sbsize)); } clear_nilfs_sb_dirty(nilfs); + nilfs->ns_flushed_device = 1; + /* make sure store to ns_flushed_device cannot be reordered */ + smp_wmb(); return nilfs_sync_super(sb, flag); } @@ -514,6 +517,9 @@ static int nilfs_sync_fs(struct super_block *sb, int wait) } up_write(&nilfs->ns_sem); + if (!err) + err = nilfs_flush_device(nilfs); + return err; } diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index d01ead1..23778d3 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -46,6 +46,7 @@ enum { /** * struct the_nilfs - struct to supervise multiple nilfs mount points * @ns_flags: flags + * @ns_flushed_device: flag indicating if all volatile data was flushed * @ns_bdev: block device * @ns_sem: semaphore for shared states * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts @@ -103,6 +104,7 @@ enum { */ struct the_nilfs { unsigned long ns_flags; + int ns_flushed_device; struct block_device*ns_bdev; struct rw_semaphore ns_sem; @@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct the_nilfs *nilfs, __u64 n) return n == nilfs->ns_segnum || n == nilfs->ns_nextnum; } +static inline int nilfs_flush_device(struct the_nilfs *nilfs) +{ + int err; + + if (!nilfs_test_opt(nilfs, BARRIER) || nilfs->ns_flushed_device) + return 0; + + nilfs->ns_flushed_device = 1; + /* +* the store to ns_flushed_device must not be reordered after +* blk
[PATCH v5 0/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
Hi, I have looked a bit more into the semantics of the various flags concerning block device caching behaviour. According to "Documentation/block/writeback_cache_control.txt" a call to blkdev_issue_flush() is equivalent to an empty bio with the REQ_FLUSH flag set. So there is no need to call blkdev_issue_flush() after a call to nilfs_commit_super(). But if there is no need to write the super block an additional call to blkdev_issue_flush() is necessary. To avoid an overhead I introduced the nilfs->ns_flushed_device flag, which is set to 0 whenever new logs are written and set to 1 whenever the block device is flushed. If the super block was written during segment construction or in nilfs_sync_fs(), then blkdev_issue_flush() is not called. br, Andreas Rohner v4->v5 (review by Ryusuke Konishi) * Move device flushing logic into separate function * Fix invalid comment * Move clearing of the flag to nilfs_segctor_complete_write() and nilfs_construct_dsync_segment() v3->v4 (review by Ryusuke Konishi) * Replace atomic_t with int for ns_flushed_device * Use smp_wmb() to guarantee correct ordering v2->v3 (review of Ryusuke Konishi) * Use separate atomic flag for ns_flushed_device instead of a bit flag in ns_flags * Use smp_mb__after_atomic() after setting ns_flushed_device v1->v2 * Add new flag THE_NILFS_FLUSHED Andreas Rohner (1): nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() fs/nilfs2/file.c | 8 +++- fs/nilfs2/ioctl.c | 8 +++- fs/nilfs2/segment.c | 3 +++ fs/nilfs2/super.c | 6 ++ fs/nilfs2/the_nilfs.h | 22 ++ 5 files changed, 37 insertions(+), 10 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
On Tue, 9 Sep 2014 23:17:09 +0200, Andreas Rohner wrote: > Under normal circumstances nilfs_sync_fs() writes out the super block, > which causes a flush of the underlying block device. But this depends on > the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the > last segment crosses a segment boundary. So if only a small amount of > data is written before the call to nilfs_sync_fs(), no flush of the > block device occurs. > > In the above case an additional call to blkdev_issue_flush() is needed. > To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device > is introduced, which is cleared whenever new logs are written and set > whenever the block device is flushed. > > Signed-off-by: Andreas Rohner > --- > fs/nilfs2/file.c | 10 +- > fs/nilfs2/ioctl.c | 10 +- > fs/nilfs2/segment.c | 4 > fs/nilfs2/super.c | 17 + > fs/nilfs2/the_nilfs.h | 2 ++ > 5 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c > index 2497815..16375c2 100644 > --- a/fs/nilfs2/file.c > +++ b/fs/nilfs2/file.c > @@ -56,7 +56,15 @@ int nilfs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > mutex_unlock(&inode->i_mutex); > > nilfs = inode->i_sb->s_fs_info; > - if (!err && nilfs_test_opt(nilfs, BARRIER)) { > + if (!err && nilfs_test_opt(nilfs, BARRIER) && > + !nilfs->ns_flushed_device) { > + nilfs->ns_flushed_device = 1; > + /* > + * the store to ns_flushed_device must not be reordered after > + * blkdev_issue_flush > + */ > + smp_wmb(); > + > err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > if (err != -EIO) > err = 0; Looks good. But, these code lines and the comment appear repeatedly. To simplify these, I propose to add the following inline function to the_nilfs.h. @@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct the_nilfs *nilfs, __u64 n) return n == nilfs->ns_segnum || n == nilfs->ns_nextnum; } +static inline int nilfs_flush_device(struct the_nilfs *nilfs) +{ + int err; + + if (!nilfs_test_opt(nilfs, BARRIER) || nilfs->ns_flushed_device) + return 0; + + nilfs->ns_flushed_device = 1; + /* +* the store to ns_flushed_device must not be reordered after +* blkdev_issue_flush(). +*/ + smp_wmb(); + + err = blkdev_issue_flush(nilfs->ns_bdev, GFP_KERNEL, NULL); + if (err != -EIO) + err = 0; + return err; +} + #endif /* _THE_NILFS_H */ > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c > index 422fb54..9444d5d 100644 > --- a/fs/nilfs2/ioctl.c > +++ b/fs/nilfs2/ioctl.c > @@ -1022,7 +1022,15 @@ static int nilfs_ioctl_sync(struct inode *inode, > struct file *filp, > return ret; > > nilfs = inode->i_sb->s_fs_info; > - if (nilfs_test_opt(nilfs, BARRIER)) { > + if (nilfs_test_opt(nilfs, BARRIER) && > + !nilfs->ns_flushed_device) { > + nilfs->ns_flushed_device = 1; > + /* > + * the store to ns_flushed_device must not be reordered after > + * blkdev_issue_flush > + */ > + smp_wmb(); > + > ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > if (ret == -EIO) > return ret; > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > index a1a1916..379da1b 100644 > --- a/fs/nilfs2/segment.c > +++ b/fs/nilfs2/segment.c > @@ -1997,6 +1997,10 @@ static int nilfs_segctor_do_construct(struct > nilfs_sc_info *sci, int mode) > err = nilfs_segctor_wait(sci); > if (err) > goto failed_to_write; > + > + if (test_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags) || > + mode == SC_LSEG_DSYNC) > + nilfs->ns_flushed_device = 0; > } > } while (sci->sc_stage.scnt != NILFS_ST_DONE); We can simplify this by inserting "nilfs->ns_flushed_device = 0" in nilfs_segctor_complete_write() and nilfs_construct_dsync_segment() separately as follows. Explicit memory barriers are not required in the following changes because both nilfs_set_last_segment() and nilfs_transaction_unlock() imply a memory barrier. --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) nilfs_set_next_segment(nilfs, segbuf); if (update_sr) { + nilfs->ns_flushed_device = 0; nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start, segbuf->sb_sum.seg_seq, nilfs->ns_cno++); @@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block *