Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-03 Thread Andreas Rohner
On 2014-09-03 02:35, Ryusuke Konishi wrote:
> On Mon, 01 Sep 2014 21:18:30 +0200, Andreas Rohner wrote:
>> On 2014-09-01 20:43, Andreas Rohner wrote:
>>> Hi Ryusuke,
>>> On 2014-09-01 19:59, Ryusuke Konishi wrote:
 On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
> Under normal circumstances nilfs_sync_fs() writes out the super block,
> which causes a flush of the underlying block device. But this depends on
> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
> last segment crosses a segment boundary. So if only a small amount of
> data is written before the call to nilfs_sync_fs(), no flush of the
> block device occurs.
>
> In the above case an additional call to blkdev_issue_flush() is needed.
> To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
> introduced, which is cleared whenever new logs are written and set
> whenever the block device is flushed.
>
> Signed-off-by: Andreas Rohner 

 The patch looks good to me except that I feel the use of atomic
 test-and-set bitwise operations something unfavorable (though it's
 logically correct).  I will try to send this to upstream as is unless
 a comment comes to mind.
>>>
>>> I originally thought, that it is necessary to do it atomically to avoid
>>> a race condition, but I am not so sure about that any more. I think the
>>> only case we have to avoid is, to call set_nilfs_flushed() after
>>> blkdev_issue_flush(), because this could race with the
>>> clear_nilfs_flushed() from the segment construction. So this should also
>>> work:
>>>
>>>  +  if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
>>>  +  !nilfs_flushed(nilfs)) {
>>>  +  set_nilfs_flushed(nilfs);
>>>  +  err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
>>>  +  if (err != -EIO)
>>>  +  err = 0;
>>>  +  }
>>>  +
>>
>> On the other hand, it says in the comments to set_bit(), that it can be
>> reordered on architectures other than x86. test_and_set_bit() implies a
>> memory barrier on all architectures. But I don't think the processor
>> would reorder set_nilfs_flushed() after the external function call to
>> blkdev_issue_flush(), would it?
> 
> I believe compiler doesn't reorder set_bit() operation after an
> external function call unless it knows the content of the function and
> the function can be optimized.  But, yes, set_bit() doesn't imply
> memory barrier unlike test_and_set_bit().  As for
> blkdev_issue_flush(), it would imply memory barrier by some lock
> functions or other primitive used inside it.  (I haven't actually
> confirmed that the premise is true)

Yes blkdev_issue_flush() probably implies a memory barrier.

> On the other hand, we need explicit barrier operation like
> smp_mb__after_atomic() if a certain operation is performed after
> set_bit() and the changed bit should be visible to other processors
> before the operation.

Great suggestion. I didn't know about those functions. Do we also need a
call to smp_mb__before_atomic() before clear_nilfs_flushed(nilfs) in
segment.c?

I would be happy to provide another version of the patch with
set_nilfs_flushed(nilfs) and smp_mb__after_atomic() if you prefer that
version over the test_and_set_bit approach...

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] nilfs2: add a tracepoint for tracking stage transition of segment construction

2014-09-03 Thread Ryusuke Konishi
Hi Mitake-san,
On Tue,  2 Sep 2014 21:19:39 +0900, Mitake Hitoshi wrote:
> From: Hitoshi Mitake 
> 
> This patch adds a tracepoint for tracking stage transition of block
> collection in segment construction. With the tracepoint, we can
> analysis the behavior of segment construction in depth. It would be
> useful for bottleneck detection and debugging, etc.
> 
> The tracepoint is created with the standard trace API of linux (like
> ext3, ext4, f2fs and btrfs). So we can analysis with existing tools
> easily. Of course, more detailed analysis will be possible if we can
> create nilfs specific analysis tools.
> 
> Below is an example of event dump with Brendan Gregg's perf-tools
> (https://github.com/brendangregg/perf-tools). Time consumption between
> each stage can be obtained.
> 
> $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
> Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
> segctord-14875 [003] ...1 28311.067794: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_INIT
> segctord-14875 [003] ...1 28311.068139: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_GC
> segctord-14875 [003] ...1 28311.068139: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_FILE
> segctord-14875 [003] ...1 28311.068486: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_IFILE
> segctord-14875 [003] ...1 28311.068540: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_CPFILE
> segctord-14875 [003] ...1 28311.068561: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SUFILE
> segctord-14875 [003] ...1 28311.068565: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DAT
> segctord-14875 [003] ...1 28311.068573: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SR
> segctord-14875 [003] ...1 28311.068574: 
> nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DONE
> 
> For capturing transition correctly, this patch renames the member scnt
> of nilfs_cstage and adds wrappers for the member. With this change,
> every transition of the stage can produce trace event in a correct
> manner.
> 
> Of course the tracepoint added by this patch is very limited, so we
> need to add more points for detailed analysis. This patch is something
> like demonstration. If this concept is acceptable for the nilfs
> community, I'd like to add more tracepoints and prepare analysis
> tools.

Great!

This tracepoint support looks to be what I wanted to introduce to
nilfs2 to help debugging and performance analysis. I felt it's really
nice after I tried this patch with the perf-tools though I am not
familiar with the manner of the tracepoints.

Could you proceed this work from what you think useful ?  I will help
sending this work to upstream step by step, and would like to extend
it learning various tracepoint features.

By the way, your mail addresses differ between the author line (from
line) and the sob line.  Can you include a "From" line so that the
mail addresses match between them.

Thanks,
Ryusuke Konishi

> Signed-off-by: Hitoshi Mitake 
> ---
>  fs/nilfs2/segment.c   | 70 
> ++-
>  fs/nilfs2/segment.h   |  5 ++--
>  include/trace/events/nilfs2.h | 50 +++
>  3 files changed, 103 insertions(+), 22 deletions(-)
>  create mode 100644 include/trace/events/nilfs2.h
> 
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index a1a1916..e841e22 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -76,6 +76,35 @@ enum {
>   NILFS_ST_DONE,
>  };
>  
> +#define CREATE_TRACE_POINTS
> +#include 
> +
> +/*
> + * nilfs_sc_cstage_inc(), nilfs_sc_cstage_set(), nilfs_sc_cstage_get() are
> + * wrapper functions of stage count (nilfs_sc_info->sc_stage.__scnt). Users 
> of
> + * the variable must use them because transition of stage count must involve
> + * trace events (trace_nilfs2_collection_stage_transition).
> + *
> + * nilfs_sc_cstage_get() isn't required for the above purpose because it 
> doesn't
> + * produce events. It is provided just for making the intention clear.
> + */
> +static inline void nilfs_sc_cstage_inc(struct nilfs_sc_info *sci)
> +{
> + sci->sc_stage.__scnt++;
> + trace_nilfs2_collection_stage_transition(sci);
> +}
> +
> +static inline void nilfs_sc_cstage_set(struct nilfs_sc_info *sci, int 
> next_scnt)
> +{
> + sci->sc_stage.__scnt = next_scnt;
> + trace_nilfs2_collection_stage_transition(sci);
> +}
> +
> +static inline int nilfs_sc_cstage_get(struct nilfs_sc_info *sci)
> +{
> + return sci->sc_stage.__scnt;
> +}
> +
>  /* State flags of collection */
>  #define NILFS_CF_NODE0x0001  /* Collecting node blocks */
>  #define NILFS_CF_IFILE_STARTED   0x0002  /* IFILE stage has started */
> @@