Re: [PATCH v3] nilfs2: add a tracepoint for tracking stage transition of segment construction

2014-09-13 Thread Ryusuke Konishi
On Sun, 14 Sep 2014 00:14:36 +0900, Mitake Hitoshi wrote:
> This patch adds a tracepoint for tracking stage transition of block
> collection in segment construction. With the tracepoint, we can
> analysis the behavior of segment construction in depth. It would be
> useful for bottleneck detection and debugging, etc.
> 
> The tracepoint is created with the standard trace API of linux (like
> ext3, ext4, f2fs and btrfs). So we can analysis with existing tools
> easily. Of course, more detailed analysis will be possible if we can
> create nilfs specific analysis tools.
> 
> Below is an example of event dump with Brendan Gregg's perf-tools
> (https://github.com/brendangregg/perf-tools). Time consumption between
> each stage can be obtained.
> 
> $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
> Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
> segctord-14875 [003] ...1 28311.067794: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_INIT
> segctord-14875 [003] ...1 28311.068139: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_GC
> segctord-14875 [003] ...1 28311.068139: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_FILE
> segctord-14875 [003] ...1 28311.068486: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_IFILE
> segctord-14875 [003] ...1 28311.068540: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_CPFILE
> segctord-14875 [003] ...1 28311.068561: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SUFILE
> segctord-14875 [003] ...1 28311.068565: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DAT
> segctord-14875 [003] ...1 28311.068573: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SR
> segctord-14875 [003] ...1 28311.068574: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DONE
> 
> For capturing transition correctly, this patch adds wrappers for the
> member scnt of nilfs_cstage. With this change, every transition of the
> stage can produce trace event in a correct manner.
> 
> Of course the tracepoint added by this patch is very limited, so we
> need to add more points for detailed analysis. This patch is something
> like demonstration. If this concept is acceptable for the nilfs
> community, I'd like to add more tracepoints and prepare analysis
> tools.
> 
> Signed-off-by: Hitoshi Mitake 
> ---
>  fs/nilfs2/segment.c   | 71 
> +++
>  fs/nilfs2/segment.h   |  3 +-
>  include/trace/events/nilfs2.h | 50 ++
>  3 files changed, 103 insertions(+), 21 deletions(-)
>  create mode 100644 include/trace/events/nilfs2.h
> 
> v3: undo rename
> 
> v2: correct the email address of author
> 


Looks good.  I pushed out this patch as "tracepoints" branch of
nilfs2.git[1].

If you hope it to be sent to upstream separately at this time, please
let me know.  (In that case, the following description should be
revised to be ready for mainline merge).

> Of course the tracepoint added by this patch is very limited, so we
> need to add more points for detailed analysis. This patch is something
> like demonstration. If this concept is acceptable for the nilfs
> community, I'd like to add more tracepoints and prepare analysis
> tools.

Otherwise, I'll keep it in the tracepoints branch for now.

[1] https://github.com/konis/nilfs2.git

Thanks,
Ryusuke Konishi

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Ryusuke Konishi
On Sat, 13 Sep 2014 16:37:53 +0200, Andreas Rohner wrote:
> Under normal circumstances nilfs_sync_fs() writes out the super block,
> which causes a flush of the underlying block device. But this depends on
> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
> last segment crosses a segment boundary. So if only a small amount of
> data is written before the call to nilfs_sync_fs(), no flush of the
> block device occurs.
> 
> In the above case an additional call to blkdev_issue_flush() is needed.
> To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
> is introduced, which is cleared whenever new logs are written and set
> whenever the block device is flushed. For convenience the function
> nilfs_flush_device() is added, which contains the above logic.
> 
> Signed-off-by: Andreas Rohner 

Applied, thank you!

Ryusuke Konishi

> ---
>  fs/nilfs2/file.c  |  8 +++-
>  fs/nilfs2/ioctl.c |  8 +++-
>  fs/nilfs2/segment.c   |  3 +++
>  fs/nilfs2/super.c |  6 ++
>  fs/nilfs2/the_nilfs.h | 22 ++
>  5 files changed, 37 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
> index 2497815..e9e3325 100644
> --- a/fs/nilfs2/file.c
> +++ b/fs/nilfs2/file.c
> @@ -56,11 +56,9 @@ int nilfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   mutex_unlock(&inode->i_mutex);
>  
>   nilfs = inode->i_sb->s_fs_info;
> - if (!err && nilfs_test_opt(nilfs, BARRIER)) {
> - err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
> - if (err != -EIO)
> - err = 0;
> - }
> + if (!err)
> + err = nilfs_flush_device(nilfs);
> +
>   return err;
>  }
>  
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 422fb54..9a20e51 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, 
> struct file *filp,
>   return ret;
>  
>   nilfs = inode->i_sb->s_fs_info;
> - if (nilfs_test_opt(nilfs, BARRIER)) {
> - ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
> - if (ret == -EIO)
> - return ret;
> - }
> + ret = nilfs_flush_device(nilfs);
> + if (ret < 0)
> + return ret;
>  
>   if (argp != NULL) {
>   down_read(&nilfs->ns_segctor_sem);
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index a1a1916..0b7d2ca 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct 
> nilfs_sc_info *sci)
>   nilfs_set_next_segment(nilfs, segbuf);
>  
>   if (update_sr) {
> + nilfs->ns_flushed_device = 0;
>   nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start,
>  segbuf->sb_sum.seg_seq, nilfs->ns_cno++);
>  
> @@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block 
> *sb, struct inode *inode,
>   sci->sc_dsync_end = end;
>  
>   err = nilfs_segctor_do_construct(sci, SC_LSEG_DSYNC);
> + if (!err)
> + nilfs->ns_flushed_device = 0;
>  
>   nilfs_transaction_unlock(sb);
>   return err;
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index 228f5bd..2e5b3ec 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -310,6 +310,9 @@ int nilfs_commit_super(struct super_block *sb, int flag)
>   nilfs->ns_sbsize));
>   }
>   clear_nilfs_sb_dirty(nilfs);
> + nilfs->ns_flushed_device = 1;
> + /* make sure store to ns_flushed_device cannot be reordered */
> + smp_wmb();
>   return nilfs_sync_super(sb, flag);
>  }
>  
> @@ -514,6 +517,9 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
>   }
>   up_write(&nilfs->ns_sem);
>  
> + if (!err)
> + err = nilfs_flush_device(nilfs);
> +
>   return err;
>  }
>  
> diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
> index d01ead1..23778d3 100644
> --- a/fs/nilfs2/the_nilfs.h
> +++ b/fs/nilfs2/the_nilfs.h
> @@ -46,6 +46,7 @@ enum {
>  /**
>   * struct the_nilfs - struct to supervise multiple nilfs mount points
>   * @ns_flags: flags
> + * @ns_flushed_device: flag indicating if all volatile data was flushed
>   * @ns_bdev: block device
>   * @ns_sem: semaphore for shared states
>   * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts
> @@ -103,6 +104,7 @@ enum {
>   */
>  struct the_nilfs {
>   unsigned long   ns_flags;
> + int ns_flushed_device;
>  
>   struct block_device*ns_bdev;
>   struct rw_semaphore ns_sem;
> @@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct 
> the_nilfs *nilfs, __u64 n)
>   return n == nilfs->ns_segnum || n == nilfs->ns_nextnum;
>  }
>  
> +static inline int nilfs_flush_device(struct the_nilfs *

[PATCH v3] nilfs2: add a tracepoint for tracking stage transition of segment construction

2014-09-13 Thread Hitoshi Mitake
This patch adds a tracepoint for tracking stage transition of block
collection in segment construction. With the tracepoint, we can
analysis the behavior of segment construction in depth. It would be
useful for bottleneck detection and debugging, etc.

The tracepoint is created with the standard trace API of linux (like
ext3, ext4, f2fs and btrfs). So we can analysis with existing tools
easily. Of course, more detailed analysis will be possible if we can
create nilfs specific analysis tools.

Below is an example of event dump with Brendan Gregg's perf-tools
(https://github.com/brendangregg/perf-tools). Time consumption between
each stage can be obtained.

$ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
segctord-14875 [003] ...1 28311.067794: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_INIT
segctord-14875 [003] ...1 28311.068139: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_GC
segctord-14875 [003] ...1 28311.068139: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_FILE
segctord-14875 [003] ...1 28311.068486: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_IFILE
segctord-14875 [003] ...1 28311.068540: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_CPFILE
segctord-14875 [003] ...1 28311.068561: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SUFILE
segctord-14875 [003] ...1 28311.068565: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DAT
segctord-14875 [003] ...1 28311.068573: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SR
segctord-14875 [003] ...1 28311.068574: 
nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DONE

For capturing transition correctly, this patch adds wrappers for the
member scnt of nilfs_cstage. With this change, every transition of the
stage can produce trace event in a correct manner.

Of course the tracepoint added by this patch is very limited, so we
need to add more points for detailed analysis. This patch is something
like demonstration. If this concept is acceptable for the nilfs
community, I'd like to add more tracepoints and prepare analysis
tools.

Signed-off-by: Hitoshi Mitake 
---
 fs/nilfs2/segment.c   | 71 +++
 fs/nilfs2/segment.h   |  3 +-
 include/trace/events/nilfs2.h | 50 ++
 3 files changed, 103 insertions(+), 21 deletions(-)
 create mode 100644 include/trace/events/nilfs2.h

v3: undo rename

v2: correct the email address of author

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..0fcf8e7 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -76,6 +76,36 @@ enum {
NILFS_ST_DONE,
 };
 
+#define CREATE_TRACE_POINTS
+#include 
+
+/*
+ * nilfs_sc_cstage_inc(), nilfs_sc_cstage_set(), nilfs_sc_cstage_get() are
+ * wrapper functions of stage count (nilfs_sc_info->sc_stage.scnt). Users of
+ * the variable must use them because transition of stage count must involve
+ * trace events (trace_nilfs2_collection_stage_transition).
+ *
+ * nilfs_sc_cstage_get() isn't required for the above purpose because it 
doesn't
+ * produce tracepoint events. It is provided just for making the intention
+ * clear.
+ */
+static inline void nilfs_sc_cstage_inc(struct nilfs_sc_info *sci)
+{
+   sci->sc_stage.scnt++;
+   trace_nilfs2_collection_stage_transition(sci);
+}
+
+static inline void nilfs_sc_cstage_set(struct nilfs_sc_info *sci, int 
next_scnt)
+{
+   sci->sc_stage.scnt = next_scnt;
+   trace_nilfs2_collection_stage_transition(sci);
+}
+
+static inline int nilfs_sc_cstage_get(struct nilfs_sc_info *sci)
+{
+   return sci->sc_stage.scnt;
+}
+
 /* State flags of collection */
 #define NILFS_CF_NODE  0x0001  /* Collecting node blocks */
 #define NILFS_CF_IFILE_STARTED 0x0002  /* IFILE stage has started */
@@ -1055,7 +1085,7 @@ static int nilfs_segctor_collect_blocks(struct 
nilfs_sc_info *sci, int mode)
size_t ndone;
int err = 0;
 
-   switch (sci->sc_stage.scnt) {
+   switch (nilfs_sc_cstage_get(sci)) {
case NILFS_ST_INIT:
/* Pre-processes */
sci->sc_stage.flags = 0;
@@ -1064,7 +1094,7 @@ static int nilfs_segctor_collect_blocks(struct 
nilfs_sc_info *sci, int mode)
sci->sc_nblk_inc = 0;
sci->sc_curseg->sb_sum.flags = NILFS_SS_LOGBGN;
if (mode == SC_LSEG_DSYNC) {
-   sci->sc_stage.scnt = NILFS_ST_DSYNC;
+   nilfs_sc_cstage_set(sci, NILFS_ST_DSYNC);
goto dsync_mode;
}
}
@@ -1072,10 +1102,10 @@ static int nilfs_segctor_collect

Re: [PATCH v2] nilfs2: add a tracepoint for tracking stage transition of segment construction

2014-09-13 Thread Hitoshi Mitake
On Fri, Sep 5, 2014 at 3:40 PM, Vyacheslav Dubeyko  wrote:
> On Fri, 2014-09-05 at 11:41 +0900, Hitoshi Mitake wrote:
>
> [snip]
>> diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h
>> index 38a1d00..1e9b931 100644
>> --- a/fs/nilfs2/segment.h
>> +++ b/fs/nilfs2/segment.h
>> @@ -66,13 +66,14 @@ struct nilfs_recovery_info {
>>
>>  /**
>>   * struct nilfs_cstage - Context of collection stage
>> - * @scnt: Stage count
>> + * @__scnt: Stage count, must be accessed via wrappers:
>> + *  nilfs_sc_cstage_inc(), nilfs_sc_cstage_set(), 
>> nilfs_sc_cstage_get()
>>   * @flags: State flags
>>   * @dirty_file_ptr: Pointer on dirty_files list, or inode of a target file
>>   * @gc_inode_ptr: Pointer on the list of gc-inodes
>>   */
>>  struct nilfs_cstage {
>> - int scnt;
>> + int __scnt;
>
> It doesn't make sense this change for my taste. As far as I can judge,
> you've made this change because scnt field should be accessed by means
> of wrappers. But __scnt name doesn't provide from direct using this
> field. So, I suppose that extended comment will be enough.

OK, I'll rename it in v3.

>
> By the way, do you plan to add tracepoints for another NILFS2
> subsystems?

Yes.

Thanks,
Hitoshi

>
> Thanks,
> Vyacheslav Dubeyko.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Andreas Rohner
Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device. But this depends on
the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary. So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.

In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
is introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed. For convenience the function
nilfs_flush_device() is added, which contains the above logic.

Signed-off-by: Andreas Rohner 
---
 fs/nilfs2/file.c  |  8 +++-
 fs/nilfs2/ioctl.c |  8 +++-
 fs/nilfs2/segment.c   |  3 +++
 fs/nilfs2/super.c |  6 ++
 fs/nilfs2/the_nilfs.h | 22 ++
 5 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index 2497815..e9e3325 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -56,11 +56,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
mutex_unlock(&inode->i_mutex);
 
nilfs = inode->i_sb->s_fs_info;
-   if (!err && nilfs_test_opt(nilfs, BARRIER)) {
-   err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
-   if (err != -EIO)
-   err = 0;
-   }
+   if (!err)
+   err = nilfs_flush_device(nilfs);
+
return err;
 }
 
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 422fb54..9a20e51 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, struct 
file *filp,
return ret;
 
nilfs = inode->i_sb->s_fs_info;
-   if (nilfs_test_opt(nilfs, BARRIER)) {
-   ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
-   if (ret == -EIO)
-   return ret;
-   }
+   ret = nilfs_flush_device(nilfs);
+   if (ret < 0)
+   return ret;
 
if (argp != NULL) {
down_read(&nilfs->ns_segctor_sem);
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..0b7d2ca 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct 
nilfs_sc_info *sci)
nilfs_set_next_segment(nilfs, segbuf);
 
if (update_sr) {
+   nilfs->ns_flushed_device = 0;
nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start,
   segbuf->sb_sum.seg_seq, nilfs->ns_cno++);
 
@@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block *sb, 
struct inode *inode,
sci->sc_dsync_end = end;
 
err = nilfs_segctor_do_construct(sci, SC_LSEG_DSYNC);
+   if (!err)
+   nilfs->ns_flushed_device = 0;
 
nilfs_transaction_unlock(sb);
return err;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 228f5bd..2e5b3ec 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -310,6 +310,9 @@ int nilfs_commit_super(struct super_block *sb, int flag)
nilfs->ns_sbsize));
}
clear_nilfs_sb_dirty(nilfs);
+   nilfs->ns_flushed_device = 1;
+   /* make sure store to ns_flushed_device cannot be reordered */
+   smp_wmb();
return nilfs_sync_super(sb, flag);
 }
 
@@ -514,6 +517,9 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
}
up_write(&nilfs->ns_sem);
 
+   if (!err)
+   err = nilfs_flush_device(nilfs);
+
return err;
 }
 
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index d01ead1..23778d3 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -46,6 +46,7 @@ enum {
 /**
  * struct the_nilfs - struct to supervise multiple nilfs mount points
  * @ns_flags: flags
+ * @ns_flushed_device: flag indicating if all volatile data was flushed
  * @ns_bdev: block device
  * @ns_sem: semaphore for shared states
  * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts
@@ -103,6 +104,7 @@ enum {
  */
 struct the_nilfs {
unsigned long   ns_flags;
+   int ns_flushed_device;
 
struct block_device*ns_bdev;
struct rw_semaphore ns_sem;
@@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct the_nilfs 
*nilfs, __u64 n)
return n == nilfs->ns_segnum || n == nilfs->ns_nextnum;
 }
 
+static inline int nilfs_flush_device(struct the_nilfs *nilfs)
+{
+   int err;
+
+   if (!nilfs_test_opt(nilfs, BARRIER) || nilfs->ns_flushed_device)
+   return 0;
+
+   nilfs->ns_flushed_device = 1;
+   /*
+* the store to ns_flushed_device must not be reordered after
+* blkdev_

[PATCH v6 0/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Andreas Rohner
Hi,

I have looked a bit more into the semantics of the various flags
concerning block device caching behaviour. According to
"Documentation/block/writeback_cache_control.txt" a call to
blkdev_issue_flush() is equivalent to an empty bio with the
REQ_FLUSH flag set. So there is no need to call blkdev_issue_flush()
after a call to nilfs_commit_super(). But if there is no need to write
the super block an additional call to blkdev_issue_flush() is necessary.

To avoid an overhead I introduced the nilfs->ns_flushed_device flag, 
which is set to 0  whenever new logs are written and set to 1 whenever 
the block device is flushed. If the super block was written during 
segment construction or in nilfs_sync_fs(), then blkdev_issue_flush() is 
not called.

br,
Andreas Rohner

v5->v6 (review by Ryusuke Konishi)
 * Remove special handling of EIO error state from nilfs_ioctl_sync()

v4->v5 (review by Ryusuke Konishi)
 * Move device flushing logic into separate function
 * Fix invalid comment
 * Move clearing of the flag to nilfs_segctor_complete_write() and
   nilfs_construct_dsync_segment()

v3->v4 (review by Ryusuke Konishi)
 * Replace atomic_t with int for ns_flushed_device
 * Use smp_wmb() to guarantee correct ordering

v2->v3 (review of Ryusuke Konishi)
 * Use separate atomic flag for ns_flushed_device instead of a bit flag 
   in ns_flags
 * Use smp_mb__after_atomic() after setting ns_flushed_device

v1->v2
 * Add new flag THE_NILFS_FLUSHED

Andreas Rohner (1):
  nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

 fs/nilfs2/file.c  |  8 +++-
 fs/nilfs2/ioctl.c |  8 +++-
 fs/nilfs2/segment.c   |  3 +++
 fs/nilfs2/super.c |  6 ++
 fs/nilfs2/the_nilfs.h | 22 ++
 5 files changed, 37 insertions(+), 10 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Ryusuke Konishi
On Sat, 13 Sep 2014 15:55:56 +0200, Andreas Rohner wrote:
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 422fb54..5a530f3 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, 
> struct file *filp,
>   return ret;
>  
>   nilfs = inode->i_sb->s_fs_info;
> - if (nilfs_test_opt(nilfs, BARRIER)) {
> - ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
> - if (ret == -EIO)
> - return ret;
> - }
> + ret = nilfs_flush_device(nilfs);

> + if (ret == -EIO)
> + return ret;

One more comment.  I think this special treatment of EIO should be
encapsulated in nilfs_flush_device().  nilfs_ioctl_sync() doesn't have
to know it:

if (ret < 0)
return ret;

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Andreas Rohner
Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device. But this depends on
the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary. So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.

In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
is introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed. For convenience the function
nilfs_flush_device() is added, which contains the above logic.

Signed-off-by: Andreas Rohner 
---
 fs/nilfs2/file.c  |  8 +++-
 fs/nilfs2/ioctl.c |  8 +++-
 fs/nilfs2/segment.c   |  3 +++
 fs/nilfs2/super.c |  6 ++
 fs/nilfs2/the_nilfs.h | 22 ++
 5 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index 2497815..e9e3325 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -56,11 +56,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
mutex_unlock(&inode->i_mutex);
 
nilfs = inode->i_sb->s_fs_info;
-   if (!err && nilfs_test_opt(nilfs, BARRIER)) {
-   err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
-   if (err != -EIO)
-   err = 0;
-   }
+   if (!err)
+   err = nilfs_flush_device(nilfs);
+
return err;
 }
 
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 422fb54..5a530f3 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1022,11 +1022,9 @@ static int nilfs_ioctl_sync(struct inode *inode, struct 
file *filp,
return ret;
 
nilfs = inode->i_sb->s_fs_info;
-   if (nilfs_test_opt(nilfs, BARRIER)) {
-   ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
-   if (ret == -EIO)
-   return ret;
-   }
+   ret = nilfs_flush_device(nilfs);
+   if (ret == -EIO)
+   return ret;
 
if (argp != NULL) {
down_read(&nilfs->ns_segctor_sem);
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..0b7d2ca 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct 
nilfs_sc_info *sci)
nilfs_set_next_segment(nilfs, segbuf);
 
if (update_sr) {
+   nilfs->ns_flushed_device = 0;
nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start,
   segbuf->sb_sum.seg_seq, nilfs->ns_cno++);
 
@@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block *sb, 
struct inode *inode,
sci->sc_dsync_end = end;
 
err = nilfs_segctor_do_construct(sci, SC_LSEG_DSYNC);
+   if (!err)
+   nilfs->ns_flushed_device = 0;
 
nilfs_transaction_unlock(sb);
return err;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 228f5bd..2e5b3ec 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -310,6 +310,9 @@ int nilfs_commit_super(struct super_block *sb, int flag)
nilfs->ns_sbsize));
}
clear_nilfs_sb_dirty(nilfs);
+   nilfs->ns_flushed_device = 1;
+   /* make sure store to ns_flushed_device cannot be reordered */
+   smp_wmb();
return nilfs_sync_super(sb, flag);
 }
 
@@ -514,6 +517,9 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
}
up_write(&nilfs->ns_sem);
 
+   if (!err)
+   err = nilfs_flush_device(nilfs);
+
return err;
 }
 
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index d01ead1..23778d3 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -46,6 +46,7 @@ enum {
 /**
  * struct the_nilfs - struct to supervise multiple nilfs mount points
  * @ns_flags: flags
+ * @ns_flushed_device: flag indicating if all volatile data was flushed
  * @ns_bdev: block device
  * @ns_sem: semaphore for shared states
  * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts
@@ -103,6 +104,7 @@ enum {
  */
 struct the_nilfs {
unsigned long   ns_flags;
+   int ns_flushed_device;
 
struct block_device*ns_bdev;
struct rw_semaphore ns_sem;
@@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct the_nilfs 
*nilfs, __u64 n)
return n == nilfs->ns_segnum || n == nilfs->ns_nextnum;
 }
 
+static inline int nilfs_flush_device(struct the_nilfs *nilfs)
+{
+   int err;
+
+   if (!nilfs_test_opt(nilfs, BARRIER) || nilfs->ns_flushed_device)
+   return 0;
+
+   nilfs->ns_flushed_device = 1;
+   /*
+* the store to ns_flushed_device must not be reordered after
+* blk

[PATCH v5 0/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Andreas Rohner
Hi,

I have looked a bit more into the semantics of the various flags
concerning block device caching behaviour. According to
"Documentation/block/writeback_cache_control.txt" a call to
blkdev_issue_flush() is equivalent to an empty bio with the
REQ_FLUSH flag set. So there is no need to call blkdev_issue_flush()
after a call to nilfs_commit_super(). But if there is no need to write
the super block an additional call to blkdev_issue_flush() is necessary.

To avoid an overhead I introduced the nilfs->ns_flushed_device flag, 
which is set to 0  whenever new logs are written and set to 1 whenever 
the block device is flushed. If the super block was written during 
segment construction or in nilfs_sync_fs(), then blkdev_issue_flush() is 
not called.

br,
Andreas Rohner

v4->v5 (review by Ryusuke Konishi)
 * Move device flushing logic into separate function
 * Fix invalid comment
 * Move clearing of the flag to nilfs_segctor_complete_write() and
   nilfs_construct_dsync_segment()

v3->v4 (review by Ryusuke Konishi)
 * Replace atomic_t with int for ns_flushed_device
 * Use smp_wmb() to guarantee correct ordering

v2->v3 (review of Ryusuke Konishi)
 * Use separate atomic flag for ns_flushed_device instead of a bit flag 
   in ns_flags
 * Use smp_mb__after_atomic() after setting ns_flushed_device

v1->v2
 * Add new flag THE_NILFS_FLUSHED

Andreas Rohner (1):
  nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

 fs/nilfs2/file.c  |  8 +++-
 fs/nilfs2/ioctl.c |  8 +++-
 fs/nilfs2/segment.c   |  3 +++
 fs/nilfs2/super.c |  6 ++
 fs/nilfs2/the_nilfs.h | 22 ++
 5 files changed, 37 insertions(+), 10 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-13 Thread Ryusuke Konishi
On Tue,  9 Sep 2014 23:17:09 +0200, Andreas Rohner wrote:
> Under normal circumstances nilfs_sync_fs() writes out the super block,
> which causes a flush of the underlying block device. But this depends on
> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
> last segment crosses a segment boundary. So if only a small amount of
> data is written before the call to nilfs_sync_fs(), no flush of the
> block device occurs.
> 
> In the above case an additional call to blkdev_issue_flush() is needed.
> To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
> is introduced, which is cleared whenever new logs are written and set
> whenever the block device is flushed.
> 
> Signed-off-by: Andreas Rohner 
> ---
>  fs/nilfs2/file.c  | 10 +-
>  fs/nilfs2/ioctl.c | 10 +-
>  fs/nilfs2/segment.c   |  4 
>  fs/nilfs2/super.c | 17 +
>  fs/nilfs2/the_nilfs.h |  2 ++
>  5 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
> index 2497815..16375c2 100644
> --- a/fs/nilfs2/file.c
> +++ b/fs/nilfs2/file.c
> @@ -56,7 +56,15 @@ int nilfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   mutex_unlock(&inode->i_mutex);
>  
>   nilfs = inode->i_sb->s_fs_info;
> - if (!err && nilfs_test_opt(nilfs, BARRIER)) {
> + if (!err && nilfs_test_opt(nilfs, BARRIER) &&
> + !nilfs->ns_flushed_device) {
> + nilfs->ns_flushed_device = 1;
> + /*
> +  * the store to ns_flushed_device must not be reordered after
> +  * blkdev_issue_flush
> +  */
> + smp_wmb();
> +
>   err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
>   if (err != -EIO)
>   err = 0;

Looks good.  But, these code lines and the comment appear repeatedly.
To simplify these, I propose to add the following inline function to
the_nilfs.h.

@@ -371,4 +373,24 @@ static inline int nilfs_segment_is_active(struct the_nilfs 
*nilfs, __u64 n)
return n == nilfs->ns_segnum || n == nilfs->ns_nextnum;
 }
 
+static inline int nilfs_flush_device(struct the_nilfs *nilfs)
+{
+   int err;
+
+   if (!nilfs_test_opt(nilfs, BARRIER) || nilfs->ns_flushed_device)
+   return 0;
+
+   nilfs->ns_flushed_device = 1;
+   /*
+* the store to ns_flushed_device must not be reordered after
+* blkdev_issue_flush().
+*/
+   smp_wmb();
+
+   err = blkdev_issue_flush(nilfs->ns_bdev, GFP_KERNEL, NULL);
+   if (err != -EIO)
+   err = 0;
+   return err;
+}
+
 #endif /* _THE_NILFS_H */

> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 422fb54..9444d5d 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -1022,7 +1022,15 @@ static int nilfs_ioctl_sync(struct inode *inode, 
> struct file *filp,
>   return ret;
>  
>   nilfs = inode->i_sb->s_fs_info;
> - if (nilfs_test_opt(nilfs, BARRIER)) {
> + if (nilfs_test_opt(nilfs, BARRIER) &&
> + !nilfs->ns_flushed_device) {
> + nilfs->ns_flushed_device = 1;
> + /*
> +  * the store to ns_flushed_device must not be reordered after
> +  * blkdev_issue_flush
> +  */
> + smp_wmb();
> +
>   ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
>   if (ret == -EIO)
>   return ret;
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index a1a1916..379da1b 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1997,6 +1997,10 @@ static int nilfs_segctor_do_construct(struct 
> nilfs_sc_info *sci, int mode)
>   err = nilfs_segctor_wait(sci);
>   if (err)
>   goto failed_to_write;
> +
> + if (test_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags) ||
> + mode == SC_LSEG_DSYNC)
> + nilfs->ns_flushed_device = 0;
>   }
>   } while (sci->sc_stage.scnt != NILFS_ST_DONE);

We can simplify this by inserting "nilfs->ns_flushed_device = 0" in
nilfs_segctor_complete_write() and nilfs_construct_dsync_segment()
separately as follows.  Explicit memory barriers are not required in
the following changes because both nilfs_set_last_segment() and
nilfs_transaction_unlock() imply a memory barrier.

--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1833,6 +1833,7 @@ static void nilfs_segctor_complete_write(struct 
nilfs_sc_info *sci)
nilfs_set_next_segment(nilfs, segbuf);
 
if (update_sr) {
+   nilfs->ns_flushed_device = 0;
nilfs_set_last_segment(nilfs, segbuf->sb_pseg_start,
   segbuf->sb_sum.seg_seq, nilfs->ns_cno++);
 
@@ -2216,6 +2217,8 @@ int nilfs_construct_dsync_segment(struct super_block *