Re: [PATCH] f2fs: fix reference leaks in f2fs_acl_create
Reviewed-by: Changman Lee On Mon, Mar 09, 2015 at 06:18:19PM +0800, Chao Yu wrote: > Our f2fs_acl_create is copied and modified from posix_acl_create to avoid > deadlock bug when inline_dentry feature is enabled. > > Now, we got reference leaks in posix_acl_create, and this has been fixed in > commit fed0b588be2f ("posix_acl: fix reference leaks in posix_acl_create") > by Omar Sandoval. > https://lkml.org/lkml/2015/2/9/5 > > Let's fix this issue in f2fs_acl_create too. > > Signed-off-by: Chao Yu > --- > fs/f2fs/acl.c | 14 +- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c > index 7422027..4320ffa 100644 > --- a/fs/f2fs/acl.c > +++ b/fs/f2fs/acl.c > @@ -351,13 +351,11 @@ static int f2fs_acl_create(struct inode *dir, umode_t > *mode, > > *acl = f2fs_acl_clone(p, GFP_NOFS); > if (!*acl) > - return -ENOMEM; > + goto no_mem; > > ret = f2fs_acl_create_masq(*acl, mode); > - if (ret < 0) { > - posix_acl_release(*acl); > - return -ENOMEM; > - } > + if (ret < 0) > + goto no_mem_clone; > > if (ret == 0) { > posix_acl_release(*acl); > @@ -378,6 +376,12 @@ no_acl: > *default_acl = NULL; > *acl = NULL; > return 0; > + > +no_mem_clone: > + posix_acl_release(*acl); > +no_mem: > + posix_acl_release(p); > + return -ENOMEM; > } > > int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage, > -- > 2.3.0 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] f2fs: fix reference leaks in f2fs_acl_create
Reviewed-by: Changman Lee cm224@ssamsung.com On Mon, Mar 09, 2015 at 06:18:19PM +0800, Chao Yu wrote: Our f2fs_acl_create is copied and modified from posix_acl_create to avoid deadlock bug when inline_dentry feature is enabled. Now, we got reference leaks in posix_acl_create, and this has been fixed in commit fed0b588be2f (posix_acl: fix reference leaks in posix_acl_create) by Omar Sandoval. https://lkml.org/lkml/2015/2/9/5 Let's fix this issue in f2fs_acl_create too. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/acl.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 7422027..4320ffa 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c @@ -351,13 +351,11 @@ static int f2fs_acl_create(struct inode *dir, umode_t *mode, *acl = f2fs_acl_clone(p, GFP_NOFS); if (!*acl) - return -ENOMEM; + goto no_mem; ret = f2fs_acl_create_masq(*acl, mode); - if (ret 0) { - posix_acl_release(*acl); - return -ENOMEM; - } + if (ret 0) + goto no_mem_clone; if (ret == 0) { posix_acl_release(*acl); @@ -378,6 +376,12 @@ no_acl: *default_acl = NULL; *acl = NULL; return 0; + +no_mem_clone: + posix_acl_release(*acl); +no_mem: + posix_acl_release(p); + return -ENOMEM; } int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage, -- 2.3.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs:remove unnecessary condition judgment
Hi Yuan, On Sat, Mar 07, 2015 at 10:05:25AM +, Yuan Zhong wrote: > Remove the unnecessary condition judgment, because > 'max_slots' has been initialized to '0' at the beginging > of the function, as following: > if (max_slots) >max_len = 0; There is wrong statement. It should be fixed as *max_slot = 0. Thanks, > > Signed-off-by: Yuan Zhong > --- > fs/f2fs/dir.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > index 590aeef..1f1a1bc 100644 > --- a/fs/f2fs/dir.c > +++ b/fs/f2fs/dir.c > @@ -139,7 +139,7 @@ struct f2fs_dir_entry *find_target_dentry(struct qstr > *name, int *max_slots, > !memcmp(d->filename[bit_pos], name->name, name->len)) > goto found; > > - if (max_slots && *max_slots >= 0 && max_len > *max_slots) { > + if (max_slots && max_len > *max_slots) { > *max_slots = max_len; > max_len = 0; > } > -- > 1.7.9.5 > -- > Dive into the World of Parallel Programming The Go Parallel Website, sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for all > things parallel software development, from weekly thought leadership blogs to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] f2fs: fix max orphan inodes calculation
--> 8 -- >From ce2462523dd5940b59f770c09a50d4babff5fcdb Mon Sep 17 00:00:00 2001 From: Changman Lee Date: Mon, 9 Mar 2015 08:07:04 +0900 Subject: [PATCH] f2fs: cleanup statement about max orphan inodes calc Through each macro, we can read the meaning easily. Signed-off-by: Changman Lee --- fs/f2fs/checkpoint.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 53bc328..384bfc4 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1104,13 +1104,6 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) im->ino_num = 0; } - /* -* considering 512 blocks in a segment 8+cp_payload blocks are -* needed for cp and log segment summaries. Remaining blocks are -* used to keep orphan entries with the limitation one reserved -* segment for cp pack we can have max 1020*(504-cp_payload) -* orphan entries -*/ sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS - NR_CURSEG_TYPE - __cp_payload(sbi)) * F2FS_ORPHANS_PER_BLOCK; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs:remove unnecessary condition judgment
Hi Yuan, On Sat, Mar 07, 2015 at 10:05:25AM +, Yuan Zhong wrote: Remove the unnecessary condition judgment, because 'max_slots' has been initialized to '0' at the beginging of the function, as following: if (max_slots) max_len = 0; There is wrong statement. It should be fixed as *max_slot = 0. Thanks, Signed-off-by: Yuan Zhong yuan.mark.zh...@samsung.com --- fs/f2fs/dir.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 590aeef..1f1a1bc 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -139,7 +139,7 @@ struct f2fs_dir_entry *find_target_dentry(struct qstr *name, int *max_slots, !memcmp(d-filename[bit_pos], name-name, name-len)) goto found; - if (max_slots *max_slots = 0 max_len *max_slots) { + if (max_slots max_len *max_slots) { *max_slots = max_len; max_len = 0; } -- 1.7.9.5 -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] f2fs: fix max orphan inodes calculation
-- 8 -- From ce2462523dd5940b59f770c09a50d4babff5fcdb Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Mon, 9 Mar 2015 08:07:04 +0900 Subject: [PATCH] f2fs: cleanup statement about max orphan inodes calc Through each macro, we can read the meaning easily. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 53bc328..384bfc4 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1104,13 +1104,6 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) im-ino_num = 0; } - /* -* considering 512 blocks in a segment 8+cp_payload blocks are -* needed for cp and log segment summaries. Remaining blocks are -* used to keep orphan entries with the limitation one reserved -* segment for cp pack we can have max 1020*(504-cp_payload) -* orphan entries -*/ sbi-max_orphans = (sbi-blocks_per_seg - F2FS_CP_PACKS - NR_CURSEG_TYPE - __cp_payload(sbi)) * F2FS_ORPHANS_PER_BLOCK; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] f2fs: fix max orphan inodes calculation
On Fri, Feb 27, 2015 at 05:38:13PM +0800, Wanpeng Li wrote: > cp_payload is introduced for sit bitmap to support large volume, and it is > just after the block of f2fs_checkpoint + nat bitmap, so the first segment > should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks. > However, current max orphan inodes calculation don't consider cp_payload, > this patch fix it by reducing the number of cp_payload from total blocks of > the first segment when calculate max orphan inodes. > > Signed-off-by: Wanpeng Li > --- > v1 -> v2: > * adjust comments above the codes > * fix coding style issue > > fs/f2fs/checkpoint.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index db82e09..a914e99 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -1103,13 +1103,15 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) > } > > /* > - * considering 512 blocks in a segment 8 blocks are needed for cp > - * and log segment summaries. Remaining blocks are used to keep > - * orphan entries with the limitation one reserved segment > - * for cp pack we can have max 1020*504 orphan entries > + * considering 512 blocks in a segment 8+cp_payload blocks are > + * needed for cp and log segment summaries. Remaining blocks are > + * used to keep orphan entries with the limitation one reserved > + * segment for cp pack we can have max 1020*(504-cp_payload) > + * orphan entries >*/ Hi all, I think below code give us information enough so it doesn't need to describe above comments. And someone could get confused by 1020 constants. How do you think about removing comments. Regards, Changman > sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS - > - NR_CURSEG_TYPE) * F2FS_ORPHANS_PER_BLOCK; > + NR_CURSEG_TYPE - __cp_payload(sbi)) * > + F2FS_ORPHANS_PER_BLOCK; > } > > int __init create_checkpoint_caches(void) > -- > 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] f2fs: fix to issue small discard in real-time mode discard
On Sat, Feb 28, 2015 at 05:23:30PM +0800, Chao Yu wrote: > Now in f2fs, we share functions and structures for batch mode and real-time > mode > discard. For real-time mode discard, in shared function add_discard_addrs, we > will use uninitialized trim_minlen in struct cp_control to compare with length > of contiguous free blocks to decide whether skipping discard fragmented > freespace > or not, this makes us ignore small discard sometimes. Fix it. > > Signed-off-by: Chao Yu > --- > fs/f2fs/segment.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index daee4ab..fcc1cc2 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -549,7 +549,7 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, > struct cp_control *cpc) > > end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1); > > - if (end - start < cpc->trim_minlen) > + if (force && end - start < cpc->trim_minlen) > continue; Reviewed-by : Changman Lee > > __add_discard_entry(sbi, cpc, start, end); > -- > 2.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] f2fs: fix max orphan inodes calculation
On Fri, Feb 27, 2015 at 05:38:13PM +0800, Wanpeng Li wrote: cp_payload is introduced for sit bitmap to support large volume, and it is just after the block of f2fs_checkpoint + nat bitmap, so the first segment should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks. However, current max orphan inodes calculation don't consider cp_payload, this patch fix it by reducing the number of cp_payload from total blocks of the first segment when calculate max orphan inodes. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v1 - v2: * adjust comments above the codes * fix coding style issue fs/f2fs/checkpoint.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index db82e09..a914e99 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1103,13 +1103,15 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) } /* - * considering 512 blocks in a segment 8 blocks are needed for cp - * and log segment summaries. Remaining blocks are used to keep - * orphan entries with the limitation one reserved segment - * for cp pack we can have max 1020*504 orphan entries + * considering 512 blocks in a segment 8+cp_payload blocks are + * needed for cp and log segment summaries. Remaining blocks are + * used to keep orphan entries with the limitation one reserved + * segment for cp pack we can have max 1020*(504-cp_payload) + * orphan entries */ Hi all, I think below code give us information enough so it doesn't need to describe above comments. And someone could get confused by 1020 constants. How do you think about removing comments. Regards, Changman sbi-max_orphans = (sbi-blocks_per_seg - F2FS_CP_PACKS - - NR_CURSEG_TYPE) * F2FS_ORPHANS_PER_BLOCK; + NR_CURSEG_TYPE - __cp_payload(sbi)) * + F2FS_ORPHANS_PER_BLOCK; } int __init create_checkpoint_caches(void) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] f2fs: fix to issue small discard in real-time mode discard
On Sat, Feb 28, 2015 at 05:23:30PM +0800, Chao Yu wrote: Now in f2fs, we share functions and structures for batch mode and real-time mode discard. For real-time mode discard, in shared function add_discard_addrs, we will use uninitialized trim_minlen in struct cp_control to compare with length of contiguous free blocks to decide whether skipping discard fragmented freespace or not, this makes us ignore small discard sometimes. Fix it. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/segment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index daee4ab..fcc1cc2 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -549,7 +549,7 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc) end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1); - if (end - start cpc-trim_minlen) + if (force end - start cpc-trim_minlen) continue; Reviewed-by : Changman Lee cm224@samsung.com __add_discard_entry(sbi, cpc, start, end); -- 2.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 5/5 v2] f2fs: introduce a batched trim
Hi Jaegeuk, IMHO, it looks better user could decide the size of trim considering latency of trim. Otherwise, additional checkpoints user doesn't want will occur. Regards, Changman On Mon, Feb 02, 2015 at 03:29:25PM -0800, Jaegeuk Kim wrote: > Change long from v1: > o add description > o change the # of batched segments suggested by Chao > o make consistent for # of batched segments > > This patch introduces a batched trimming feature, which submits split discard > commands. > > This patch introduces a batched trimming feature, which submits split discard > commands. > > This is to avoid long latency due to huge trim commands. > If fstrim was triggered ranging from 0 to the end of device, we should lock > all the checkpoint-related mutexes, resulting in very long latency. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/f2fs.h| 2 ++ > fs/f2fs/segment.c | 16 +++- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 8231a59..ec5e66f 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -105,6 +105,8 @@ enum { > CP_DISCARD, > }; > > +#define BATCHED_TRIM_SEGMENTS(sbi) (((sbi)->segs_per_sec) << 5) > + > struct cp_control { > int reason; > __u64 trim_start; > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 5ea57ec..b85bb97 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -1066,14 +1066,20 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct > fstrim_range *range) > end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 : > GET_SEGNO(sbi, end); > cpc.reason = CP_DISCARD; > - cpc.trim_start = start_segno; > - cpc.trim_end = end_segno; > cpc.trim_minlen = range->minlen >> sbi->log_blocksize; > > /* do checkpoint to issue discard commands safely */ > - mutex_lock(>gc_mutex); > - write_checkpoint(sbi, ); > - mutex_unlock(>gc_mutex); > + for (; start_segno <= end_segno; > + start_segno += BATCHED_TRIM_SEGMENTS(sbi)) { > + cpc.trim_start = start_segno; > + cpc.trim_end = min_t(unsigned int, > + start_segno + BATCHED_TRIM_SEGMENTS (sbi) - 1, > + end_segno); > + > + mutex_lock(>gc_mutex); > + write_checkpoint(sbi, ); > + mutex_unlock(>gc_mutex); > + } > out: > range->len = cpc.trimmed << sbi->log_blocksize; > return 0; > -- > 2.1.1 > > > -- > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 5/5 v2] f2fs: introduce a batched trim
Hi Jaegeuk, IMHO, it looks better user could decide the size of trim considering latency of trim. Otherwise, additional checkpoints user doesn't want will occur. Regards, Changman On Mon, Feb 02, 2015 at 03:29:25PM -0800, Jaegeuk Kim wrote: Change long from v1: o add description o change the # of batched segments suggested by Chao o make consistent for # of batched segments This patch introduces a batched trimming feature, which submits split discard commands. This patch introduces a batched trimming feature, which submits split discard commands. This is to avoid long latency due to huge trim commands. If fstrim was triggered ranging from 0 to the end of device, we should lock all the checkpoint-related mutexes, resulting in very long latency. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 2 ++ fs/f2fs/segment.c | 16 +++- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8231a59..ec5e66f 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -105,6 +105,8 @@ enum { CP_DISCARD, }; +#define BATCHED_TRIM_SEGMENTS(sbi) (((sbi)-segs_per_sec) 5) + struct cp_control { int reason; __u64 trim_start; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 5ea57ec..b85bb97 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1066,14 +1066,20 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) end_segno = (end = MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 : GET_SEGNO(sbi, end); cpc.reason = CP_DISCARD; - cpc.trim_start = start_segno; - cpc.trim_end = end_segno; cpc.trim_minlen = range-minlen sbi-log_blocksize; /* do checkpoint to issue discard commands safely */ - mutex_lock(sbi-gc_mutex); - write_checkpoint(sbi, cpc); - mutex_unlock(sbi-gc_mutex); + for (; start_segno = end_segno; + start_segno += BATCHED_TRIM_SEGMENTS(sbi)) { + cpc.trim_start = start_segno; + cpc.trim_end = min_t(unsigned int, + start_segno + BATCHED_TRIM_SEGMENTS (sbi) - 1, + end_segno); + + mutex_lock(sbi-gc_mutex); + write_checkpoint(sbi, cpc); + mutex_unlock(sbi-gc_mutex); + } out: range-len = cpc.trimmed sbi-log_blocksize; return 0; -- 2.1.1 -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
On Wed, Jan 21, 2015 at 04:41:17PM +0800, Chao Yu wrote: > Hi Changman, > > > -Original Message- > > From: Changman Lee [mailto:cm224@gmail.com] > > Sent: Tuesday, January 20, 2015 11:06 PM > > To: Chao Yu > > Cc: Jaegeuk Kim; Changman Lee; linux-f2fs-de...@lists.sourceforge.net; > > linux-kernel@vger.kernel.org > > Subject: Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for > > rb-tree extent cache > > > > Hi Chao, > > > > Great works. :) > > Thanks! :) > > > > > 2015-01-12 16:14 GMT+09:00 Chao Yu : > > > This patch adds core functions including slab cache init function and > > > init/lookup/update/shrink/destroy function for rb-tree based extent cache. > > > > > > Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about > > > detail > > > design and implementation of extent cache. > > > > > > Todo: > > > * add a cached_ei into struct extent_tree for a quick recent cache. > > > * register rb-based extent cache shrink with mm shrink interface. > > > * disable dir inode's extent cache. > > > > > > Signed-off-by: Chao Yu > > > Signed-off-by: Jaegeuk Kim > > > Signed-off-by: Changman Lee > > If you do not object, I'd like to keep this as lots of details and ideas are > from > you and Jaegeuk. > I have no objection. > > > --- > > > fs/f2fs/data.c | 458 > > > + > > > fs/f2fs/node.c | 9 +- > > > 2 files changed, 466 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > > index 4f5b871e..bf8c5eb 100644 > > > --- a/fs/f2fs/data.c > > > +++ b/fs/f2fs/data.c > > > @@ -25,6 +25,9 @@ > > > #include "trace.h" > > > #include > > > > > > > ~ snip ~ > > > > > + > > > +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs, > > > + block_t blkaddr) > > > +{ > > > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); > > > + nid_t ino = inode->i_ino; > > > + struct extent_tree *et; > > > + struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = > > > NULL; > > > + struct extent_node *den = NULL; > > > + struct extent_info *pei; > > > + struct extent_info ei; > > > + unsigned int endofs; > > > + > > > + if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT)) > > > + return; > > > + > > > +retry: > > > + down_write(>extent_tree_lock); > > > + et = radix_tree_lookup(>extent_tree_root, ino); > > > + if (!et) { > > > > We've already made some useful functions. > > How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ? > > IMO, we'd better to use original function kmem_cache_alloc and > radix_tree_insert, > because if we use f2fs_{kmem_cache_alloc, radix_tree_insert}, we may loop in > these > functions without releasing extent_tree_lock lock when OOM, so it will block > lock > grabbers for long time which we do not wish to see. > I see. If so, let's use cond_resched() in front of goto retry after up_write. And also look into kmem_cache_alloc in __insert_extent_tree, please. > > > > > + et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC); > > > + if (!et) { > > > + up_write(>extent_tree_lock); > > > + goto retry; > > > + } > > > + if (radix_tree_insert(>extent_tree_root, ino, et)) { > > > + up_write(>extent_tree_lock); > > > + kmem_cache_free(extent_tree_slab, et); > > > + goto retry; > > > + } > > > + memset(et, 0, sizeof(struct extent_tree)); > > > + et->ino = ino; > > > + et->root = RB_ROOT; > > > + rwlock_init(>lock); > > > + atomic_set(>refcount, 0); > > > + et->count = 0; > > > + sbi->total_ext_tree++; > > > + } > > > + atomic_inc(>refcount); > > > + up_write(>extent_tree_lock); > > > + > > > > ~ snip ~ > > > > > + > &g
Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
On Wed, Jan 21, 2015 at 04:41:17PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@gmail.com] Sent: Tuesday, January 20, 2015 11:06 PM To: Chao Yu Cc: Jaegeuk Kim; Changman Lee; linux-f2fs-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org Subject: Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache Hi Chao, Great works. :) Thanks! :) 2015-01-12 16:14 GMT+09:00 Chao Yu chao2...@samsung.com: This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * add a cached_ei into struct extent_tree for a quick recent cache. * register rb-based extent cache shrink with mm shrink interface. * disable dir inode's extent cache. Signed-off-by: Chao Yu chao2...@samsung.com Signed-off-by: Jaegeuk Kim jaeg...@kernel.org Signed-off-by: Changman Lee cm224@samsung.com If you do not object, I'd like to keep this as lots of details and ideas are from you and Jaegeuk. I have no objection. --- fs/f2fs/data.c | 458 + fs/f2fs/node.c | 9 +- 2 files changed, 466 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 4f5b871e..bf8c5eb 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -25,6 +25,9 @@ #include trace.h #include trace/events/f2fs.h ~ snip ~ + +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs, + block_t blkaddr) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + nid_t ino = inode-i_ino; + struct extent_tree *et; + struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL; + struct extent_node *den = NULL; + struct extent_info *pei; + struct extent_info ei; + unsigned int endofs; + + if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT)) + return; + +retry: + down_write(sbi-extent_tree_lock); + et = radix_tree_lookup(sbi-extent_tree_root, ino); + if (!et) { We've already made some useful functions. How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ? IMO, we'd better to use original function kmem_cache_alloc and radix_tree_insert, because if we use f2fs_{kmem_cache_alloc, radix_tree_insert}, we may loop in these functions without releasing extent_tree_lock lock when OOM, so it will block lock grabbers for long time which we do not wish to see. I see. If so, let's use cond_resched() in front of goto retry after up_write. And also look into kmem_cache_alloc in __insert_extent_tree, please. + et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC); + if (!et) { + up_write(sbi-extent_tree_lock); + goto retry; + } + if (radix_tree_insert(sbi-extent_tree_root, ino, et)) { + up_write(sbi-extent_tree_lock); + kmem_cache_free(extent_tree_slab, et); + goto retry; + } + memset(et, 0, sizeof(struct extent_tree)); + et-ino = ino; + et-root = RB_ROOT; + rwlock_init(et-lock); + atomic_set(et-refcount, 0); + et-count = 0; + sbi-total_ext_tree++; + } + atomic_inc(et-refcount); + up_write(sbi-extent_tree_lock); + ~ snip ~ + + write_unlock(et-lock); + atomic_dec(et-refcount); +} + +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) +{ + struct extent_tree *treevec[EXT_TREE_VEC_SIZE]; + struct extent_node *en, *tmp; + unsigned long ino = F2FS_ROOT_INO(sbi); + struct radix_tree_iter iter; + void **slot; + unsigned int found; + unsigned int node_cnt = 0, tree_cnt = 0; + + if (available_free_memory(sbi, EXTENT_CACHE)) + return; + + spin_lock(sbi-extent_lock); + list_for_each_entry_safe(en, tmp, sbi-extent_list, list) { + if (!nr_shrink--) + break; + list_del_init(en-list); + } + spin_unlock(sbi-extent_lock); + IMHO, it's expensive to retrieve all extent_tree to free extent_node that list_empty() is true. Yes, it will cause heavy overhead to release extent_node in extent cache which has huge number of extent_node. Is there any idea
Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
Hi Chao, Great works. :) 2015-01-12 16:14 GMT+09:00 Chao Yu : > This patch adds core functions including slab cache init function and > init/lookup/update/shrink/destroy function for rb-tree based extent cache. > > Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail > design and implementation of extent cache. > > Todo: > * add a cached_ei into struct extent_tree for a quick recent cache. > * register rb-based extent cache shrink with mm shrink interface. > * disable dir inode's extent cache. > > Signed-off-by: Chao Yu > Signed-off-by: Jaegeuk Kim > Signed-off-by: Changman Lee > --- > fs/f2fs/data.c | 458 > + > fs/f2fs/node.c | 9 +- > 2 files changed, 466 insertions(+), 1 deletion(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index 4f5b871e..bf8c5eb 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -25,6 +25,9 @@ > #include "trace.h" > #include > ~ snip ~ > + > +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs, > + block_t blkaddr) > +{ > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); > + nid_t ino = inode->i_ino; > + struct extent_tree *et; > + struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL; > + struct extent_node *den = NULL; > + struct extent_info *pei; > + struct extent_info ei; > + unsigned int endofs; > + > + if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT)) > + return; > + > +retry: > + down_write(>extent_tree_lock); > + et = radix_tree_lookup(>extent_tree_root, ino); > + if (!et) { We've already made some useful functions. How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ? > + et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC); > + if (!et) { > + up_write(>extent_tree_lock); > + goto retry; > + } > + if (radix_tree_insert(>extent_tree_root, ino, et)) { > + up_write(>extent_tree_lock); > + kmem_cache_free(extent_tree_slab, et); > + goto retry; > + } > + memset(et, 0, sizeof(struct extent_tree)); > + et->ino = ino; > + et->root = RB_ROOT; > + rwlock_init(>lock); > + atomic_set(>refcount, 0); > + et->count = 0; > + sbi->total_ext_tree++; > + } > + atomic_inc(>refcount); > + up_write(>extent_tree_lock); > + ~ snip ~ > + > + write_unlock(>lock); > + atomic_dec(>refcount); > +} > + > +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) > +{ > + struct extent_tree *treevec[EXT_TREE_VEC_SIZE]; > + struct extent_node *en, *tmp; > + unsigned long ino = F2FS_ROOT_INO(sbi); > + struct radix_tree_iter iter; > + void **slot; > + unsigned int found; > + unsigned int node_cnt = 0, tree_cnt = 0; > + > + if (available_free_memory(sbi, EXTENT_CACHE)) > + return; > + > + spin_lock(>extent_lock); > + list_for_each_entry_safe(en, tmp, >extent_list, list) { > + if (!nr_shrink--) > + break; > + list_del_init(>list); > + } > + spin_unlock(>extent_lock); > + IMHO, it's expensive to retrieve all extent_tree to free extent_node that list_empty() is true. Is there any idea to improve this? For example, if each extent_node has its extent_root, it would be more fast by not to retrieve all trees. Of course, however, it uses more memory. But, I think that your patchset might just as well be merged because patches are well made and it's clearly separated with mount option. In the next time, we could improve this. Regards, Changman > + down_read(>extent_tree_lock); > + while ((found = radix_tree_gang_lookup(>extent_tree_root, > + (void **)treevec, ino, EXT_TREE_VEC_SIZE))) { > + unsigned i; > + > + ino = treevec[found - 1]->ino + 1; > + for (i = 0; i < found; i++) { > + struct extent_tree *et = treevec[i]; > + > + atomic_inc(>refcount); > + write_lock(>lock); > + node_cnt += __free_extent_tree(sbi, et, false); > + write_unlock(>lock); > +
Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
Hi Chao, Great works. :) 2015-01-12 16:14 GMT+09:00 Chao Yu chao2...@samsung.com: This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * add a cached_ei into struct extent_tree for a quick recent cache. * register rb-based extent cache shrink with mm shrink interface. * disable dir inode's extent cache. Signed-off-by: Chao Yu chao2...@samsung.com Signed-off-by: Jaegeuk Kim jaeg...@kernel.org Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/data.c | 458 + fs/f2fs/node.c | 9 +- 2 files changed, 466 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 4f5b871e..bf8c5eb 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -25,6 +25,9 @@ #include trace.h #include trace/events/f2fs.h ~ snip ~ + +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs, + block_t blkaddr) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + nid_t ino = inode-i_ino; + struct extent_tree *et; + struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL; + struct extent_node *den = NULL; + struct extent_info *pei; + struct extent_info ei; + unsigned int endofs; + + if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT)) + return; + +retry: + down_write(sbi-extent_tree_lock); + et = radix_tree_lookup(sbi-extent_tree_root, ino); + if (!et) { We've already made some useful functions. How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ? + et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC); + if (!et) { + up_write(sbi-extent_tree_lock); + goto retry; + } + if (radix_tree_insert(sbi-extent_tree_root, ino, et)) { + up_write(sbi-extent_tree_lock); + kmem_cache_free(extent_tree_slab, et); + goto retry; + } + memset(et, 0, sizeof(struct extent_tree)); + et-ino = ino; + et-root = RB_ROOT; + rwlock_init(et-lock); + atomic_set(et-refcount, 0); + et-count = 0; + sbi-total_ext_tree++; + } + atomic_inc(et-refcount); + up_write(sbi-extent_tree_lock); + ~ snip ~ + + write_unlock(et-lock); + atomic_dec(et-refcount); +} + +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) +{ + struct extent_tree *treevec[EXT_TREE_VEC_SIZE]; + struct extent_node *en, *tmp; + unsigned long ino = F2FS_ROOT_INO(sbi); + struct radix_tree_iter iter; + void **slot; + unsigned int found; + unsigned int node_cnt = 0, tree_cnt = 0; + + if (available_free_memory(sbi, EXTENT_CACHE)) + return; + + spin_lock(sbi-extent_lock); + list_for_each_entry_safe(en, tmp, sbi-extent_list, list) { + if (!nr_shrink--) + break; + list_del_init(en-list); + } + spin_unlock(sbi-extent_lock); + IMHO, it's expensive to retrieve all extent_tree to free extent_node that list_empty() is true. Is there any idea to improve this? For example, if each extent_node has its extent_root, it would be more fast by not to retrieve all trees. Of course, however, it uses more memory. But, I think that your patchset might just as well be merged because patches are well made and it's clearly separated with mount option. In the next time, we could improve this. Regards, Changman + down_read(sbi-extent_tree_lock); + while ((found = radix_tree_gang_lookup(sbi-extent_tree_root, + (void **)treevec, ino, EXT_TREE_VEC_SIZE))) { + unsigned i; + + ino = treevec[found - 1]-ino + 1; + for (i = 0; i found; i++) { + struct extent_tree *et = treevec[i]; + + atomic_inc(et-refcount); + write_lock(et-lock); + node_cnt += __free_extent_tree(sbi, et, false); + write_unlock(et-lock); + atomic_dec(et-refcount); + } + } + up_read(sbi-extent_tree_lock); + + down_write(sbi-extent_tree_lock); + radix_tree_for_each_slot(slot, sbi-extent_tree_root, iter, + F2FS_ROOT_INO(sbi)) { + struct extent_tree *et = (struct extent_tree *)*slot
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi Chao, On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote: > Hi Changman, > > Sorry for replying late! > > > -Original Message- > > From: Changman Lee [mailto:cm224@samsung.com] > > Sent: Tuesday, December 30, 2014 8:32 AM > > To: Jaegeuk Kim > > Cc: Chao Yu; linux-f2fs-de...@lists.sourceforge.net; > > linux-kernel@vger.kernel.org > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree > > > > Hi all, > > > > On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: > > > Hi Chao, > > > > > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: > > > > > > [snip] > > > > > > Nice draft. :) > > > > > > > > > > > Please see the draft below. > > > > > > > > 1) Extent management: > > > > If we use global management that managing all extents which are from > > > > different > > > > inodes in sbi, we will face with serious lock contention when we access > > > > these > > > > extents belong to different inodes concurrently, the loss may > > > > outweights the > > > > gain. > > > > > > Agreed. > > > > > > > So we choose a local management for extent which means all extents are > > > > managed by inode itself to avoid above lock contention. Addtionlly, we > > > > manage > > > > all extents globally by linking all inode into a global lru list for > > > > extent > > > > cache shrinker. > > > > Approach: > > > > a) build extent tree/rwlock/lru list/extent count in each inode. > > > > *extent tree: link all extent in rb-tree; > > > > *rwlock: protect fields when accessing extent cache > > > > concurrently; > > > > *lru list: sort all extents in accessing time order; > > > > *extent count: record total count of extents in cache. > > > > b) use lru shrink list in sbi to manage all inode which cached > > > > extents. > > > > *inode will be added or repostioned in this global list > > > > whenever > > > > extent is being access in this inode. > > > > *use spinlock to protect this shrink list. > > > > > > 1. How about adding a data structure with inode number instead of > > > referring > > > inode pointer? > > > > > > 2. How about managing extent entries globally and setting an upper bound > > > to > > > the number of extent entries instead of limiting them per each inode? > > > (The rb-tree will handle many extents per inode.) > > > > > > 3. It needs to set a minimum length for the candidate of extent cache. > > > (e.g., 64) > > > > > > > Agreed. > > > > > So, for example, > > > struct ino_entry_for_extents { > > > inode number; > > > rb_tree for extent_entry objects; > > > rwlock; > > > }; > > > > > > struct extent_entry { > > > blkaddr, len; > > > list_head *; > > > }; > > > > > > Something like this. > > > > > > [A, B, C, ... are extent entry] > > > > > > The sbi has > > > 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU) > > > 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree > > > ` ino_entry_for_extents (#11) has A, C in rb-tree > > > ` ino_entry_for_extents (#12) has Fin rb-tree > > > ` ino_entry_for_extents (#13) has G, E in rb-tree > > > > > > In f2fs_update_extent_cache and __get_data_block for #10, > > > ino_entry_for_extents (#10) was founded and updated D or B. > > > Then, updated entries are moved to MRU. > > > > > > In f2fs_evict_inode for #11, A and C are moved to LRU. > > > But, if this inode is unlinked, all the A, C, and ino_entry_for_extens > > > (#11) > > > should be released. > > > > > > In f2fs_balance_fs_bg, some LRU extents are released according to the > > > amount > > > of consumed memory. Then, it frees any ino_entry_for_extents having no > > > extent. > > > > > > IMO, we don't need to consider readahead for this, since get_data_block > > > will > > > be called by VFS readahead. > > &
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi Chao, On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote: Hi Changman, Sorry for replying late! -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, December 30, 2014 8:32 AM To: Jaegeuk Kim Cc: Chao Yu; linux-f2fs-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi all, On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: Hi Chao, On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: [snip] Nice draft. :) Please see the draft below. 1) Extent management: If we use global management that managing all extents which are from different inodes in sbi, we will face with serious lock contention when we access these extents belong to different inodes concurrently, the loss may outweights the gain. Agreed. So we choose a local management for extent which means all extents are managed by inode itself to avoid above lock contention. Addtionlly, we manage all extents globally by linking all inode into a global lru list for extent cache shrinker. Approach: a) build extent tree/rwlock/lru list/extent count in each inode. *extent tree: link all extent in rb-tree; *rwlock: protect fields when accessing extent cache concurrently; *lru list: sort all extents in accessing time order; *extent count: record total count of extents in cache. b) use lru shrink list in sbi to manage all inode which cached extents. *inode will be added or repostioned in this global list whenever extent is being access in this inode. *use spinlock to protect this shrink list. 1. How about adding a data structure with inode number instead of referring inode pointer? 2. How about managing extent entries globally and setting an upper bound to the number of extent entries instead of limiting them per each inode? (The rb-tree will handle many extents per inode.) 3. It needs to set a minimum length for the candidate of extent cache. (e.g., 64) Agreed. So, for example, struct ino_entry_for_extents { inode number; rb_tree for extent_entry objects; rwlock; }; struct extent_entry { blkaddr, len; list_head *; }; Something like this. [A, B, C, ... are extent entry] The sbi has 1. an extent_list: (LRU) A - B - C - D - E - F - G (MRU) 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree ` ino_entry_for_extents (#11) has A, C in rb-tree ` ino_entry_for_extents (#12) has Fin rb-tree ` ino_entry_for_extents (#13) has G, E in rb-tree In f2fs_update_extent_cache and __get_data_block for #10, ino_entry_for_extents (#10) was founded and updated D or B. Then, updated entries are moved to MRU. In f2fs_evict_inode for #11, A and C are moved to LRU. But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11) should be released. In f2fs_balance_fs_bg, some LRU extents are released according to the amount of consumed memory. Then, it frees any ino_entry_for_extents having no extent. IMO, we don't need to consider readahead for this, since get_data_block will be called by VFS readahead. Furthermore, we need to think about whether LRU is really best or not. IMO, the extent cache aims to improve second access speed, rather than initial cold misses. So, maybe MRU or another algorithms would be better. Right. It's very comflicated to judge which is better. In read or write path, extents could be made every time. At that time, we should decide which extent evicts instead of new extents if we set upper bound. In update, one extent could be seperated into 3. It requires 3 insertion and 1 deletion. So if update happends frequently, we could give up extent management for some ranges. And we need to bring ideas from vm managemnt. For example, active/inactive list and second chance to promotion, or batch work for insertion/deletion I thought suddenly 'Simple is best'. Let's think about better ideas together. Yeah, how about using an opposite way to the way of page cache manager? for example: node page A,B,C,D is in page cache; extent a,b,c,d is in extent cache; extent a is built from page A, ..., d is built from page D. page cache: LRU A - B - C - D MRU extent cache: LRU a - b - c - d MRU If we use 1) the same way LRU, cache pair A-a, B-b, ... may be reclaimed in the same time as OOM. 2) the opposite way, maybe A,B in page cache and d,c in extent cache will be reclaimed, but we still can hit whole cache
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi all, On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: > Hi Chao, > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: > > [snip] > > Nice draft. :) > > > > > Please see the draft below. > > > > 1) Extent management: > > If we use global management that managing all extents which are from > > different > > inodes in sbi, we will face with serious lock contention when we access > > these > > extents belong to different inodes concurrently, the loss may outweights the > > gain. > > Agreed. > > > So we choose a local management for extent which means all extents are > > managed by inode itself to avoid above lock contention. Addtionlly, we > > manage > > all extents globally by linking all inode into a global lru list for extent > > cache shrinker. > > Approach: > > a) build extent tree/rwlock/lru list/extent count in each inode. > > *extent tree: link all extent in rb-tree; > > *rwlock: protect fields when accessing extent cache > > concurrently; > > *lru list: sort all extents in accessing time order; > > *extent count: record total count of extents in cache. > > b) use lru shrink list in sbi to manage all inode which cached extents. > > *inode will be added or repostioned in this global list whenever > > extent is being access in this inode. > > *use spinlock to protect this shrink list. > > 1. How about adding a data structure with inode number instead of referring > inode pointer? > > 2. How about managing extent entries globally and setting an upper bound to > the number of extent entries instead of limiting them per each inode? > (The rb-tree will handle many extents per inode.) > > 3. It needs to set a minimum length for the candidate of extent cache. > (e.g., 64) > Agreed. > So, for example, > struct ino_entry_for_extents { > inode number; > rb_tree for extent_entry objects; > rwlock; > }; > > struct extent_entry { > blkaddr, len; > list_head *; > }; > > Something like this. > > [A, B, C, ... are extent entry] > > The sbi has > 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU) > 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree > ` ino_entry_for_extents (#11) has A, C in rb-tree > ` ino_entry_for_extents (#12) has Fin rb-tree > ` ino_entry_for_extents (#13) has G, E in rb-tree > > In f2fs_update_extent_cache and __get_data_block for #10, > ino_entry_for_extents (#10) was founded and updated D or B. > Then, updated entries are moved to MRU. > > In f2fs_evict_inode for #11, A and C are moved to LRU. > But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11) > should be released. > > In f2fs_balance_fs_bg, some LRU extents are released according to the amount > of consumed memory. Then, it frees any ino_entry_for_extents having no extent. > > IMO, we don't need to consider readahead for this, since get_data_block will > be called by VFS readahead. > > Furthermore, we need to think about whether LRU is really best or not. > IMO, the extent cache aims to improve second access speed, rather than initial > cold misses. So, maybe MRU or another algorithms would be better. > Right. It's very comflicated to judge which is better. In read or write path, extents could be made every time. At that time, we should decide which extent evicts instead of new extents if we set upper bound. In update, one extent could be seperated into 3. It requires 3 insertion and 1 deletion. So if update happends frequently, we could give up extent management for some ranges. And we need to bring ideas from vm managemnt. For example, active/inactive list and second chance to promotion, or batch work for insertion/deletion I thought suddenly 'Simple is best'. Let's think about better ideas together. > Thanks, > > > > > 2) Limitation: > > In one inode, as we split or add extent in extent cache when read/write, > > extent > > number will enlarge, so memory and CPU overhead will increase. > > In order to control the overhead of memory and CPU, we try to set a upper > > bound > > number to limit total extent number in each inode, This number is global > > configuration which is visable to all inode. This number will be exported to > > sysfs for configuring according to requirement of user. By default, designed > > number is 8. > > Chao, It's better which # of extent are controlled globally rather than limit extents per inode as Jaegeuk said to reduce extent management overhead. > > 3) Shrinker: > > There are two shrink paths: > > a) one is triggered when extent count has exceed the upper bound of > > inode's extent cache. We will try to release extent(s) from head of > > inode's inner extent lru list until extent count is equal to upper > > bound. > > This operation could be in f2fs_update_extent_cache(). > > b) the other one is triggered
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi all, On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: Hi Chao, On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: [snip] Nice draft. :) Please see the draft below. 1) Extent management: If we use global management that managing all extents which are from different inodes in sbi, we will face with serious lock contention when we access these extents belong to different inodes concurrently, the loss may outweights the gain. Agreed. So we choose a local management for extent which means all extents are managed by inode itself to avoid above lock contention. Addtionlly, we manage all extents globally by linking all inode into a global lru list for extent cache shrinker. Approach: a) build extent tree/rwlock/lru list/extent count in each inode. *extent tree: link all extent in rb-tree; *rwlock: protect fields when accessing extent cache concurrently; *lru list: sort all extents in accessing time order; *extent count: record total count of extents in cache. b) use lru shrink list in sbi to manage all inode which cached extents. *inode will be added or repostioned in this global list whenever extent is being access in this inode. *use spinlock to protect this shrink list. 1. How about adding a data structure with inode number instead of referring inode pointer? 2. How about managing extent entries globally and setting an upper bound to the number of extent entries instead of limiting them per each inode? (The rb-tree will handle many extents per inode.) 3. It needs to set a minimum length for the candidate of extent cache. (e.g., 64) Agreed. So, for example, struct ino_entry_for_extents { inode number; rb_tree for extent_entry objects; rwlock; }; struct extent_entry { blkaddr, len; list_head *; }; Something like this. [A, B, C, ... are extent entry] The sbi has 1. an extent_list: (LRU) A - B - C - D - E - F - G (MRU) 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree ` ino_entry_for_extents (#11) has A, C in rb-tree ` ino_entry_for_extents (#12) has Fin rb-tree ` ino_entry_for_extents (#13) has G, E in rb-tree In f2fs_update_extent_cache and __get_data_block for #10, ino_entry_for_extents (#10) was founded and updated D or B. Then, updated entries are moved to MRU. In f2fs_evict_inode for #11, A and C are moved to LRU. But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11) should be released. In f2fs_balance_fs_bg, some LRU extents are released according to the amount of consumed memory. Then, it frees any ino_entry_for_extents having no extent. IMO, we don't need to consider readahead for this, since get_data_block will be called by VFS readahead. Furthermore, we need to think about whether LRU is really best or not. IMO, the extent cache aims to improve second access speed, rather than initial cold misses. So, maybe MRU or another algorithms would be better. Right. It's very comflicated to judge which is better. In read or write path, extents could be made every time. At that time, we should decide which extent evicts instead of new extents if we set upper bound. In update, one extent could be seperated into 3. It requires 3 insertion and 1 deletion. So if update happends frequently, we could give up extent management for some ranges. And we need to bring ideas from vm managemnt. For example, active/inactive list and second chance to promotion, or batch work for insertion/deletion I thought suddenly 'Simple is best'. Let's think about better ideas together. Thanks, 2) Limitation: In one inode, as we split or add extent in extent cache when read/write, extent number will enlarge, so memory and CPU overhead will increase. In order to control the overhead of memory and CPU, we try to set a upper bound number to limit total extent number in each inode, This number is global configuration which is visable to all inode. This number will be exported to sysfs for configuring according to requirement of user. By default, designed number is 8. Chao, It's better which # of extent are controlled globally rather than limit extents per inode as Jaegeuk said to reduce extent management overhead. 3) Shrinker: There are two shrink paths: a) one is triggered when extent count has exceed the upper bound of inode's extent cache. We will try to release extent(s) from head of inode's inner extent lru list until extent count is equal to upper bound. This operation could be in f2fs_update_extent_cache(). b) the other one is triggered when memory util exceed threshold, we try get inode from head of global lru list(s), and release extent(s) with fixed number (by default: 64 extents)
Re: linux-next: Tree for Dec 26 (f2fs)
On Fri, Dec 26, 2014 at 12:59:05PM -0800, Jaegeuk Kim wrote: > I fixed the merged patch directly. > > Changman, > The patch was initially made by you, so let me know, if you have objection. > > Thanks, Sorry for my mistake. Thanks, Stephen and Jaegeuk. > > On Fri, Dec 26, 2014 at 11:17:15AM -0800, Randy Dunlap wrote: > > On 12/26/14 00:30, Stephen Rothwell wrote: > > > Hi all, > > > > > > There will only be intermittent releases of linux-next between now and > > > Jan 5. > > > > > > Changes since 20141221: > > > > > > > on x86_64: > > when CONFIG_F2FS_STAT_FS is not enabled: > > > > ../fs/f2fs/segment.c: In function 'rewrite_data_page': > > ../fs/f2fs/segment.c:1233:2: error: implicit declaration of function > > 'stat_inc_inplace_blocks' [-Werror=implicit-function-declaration] > > > > > > > > -- > > ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Dec 26 (f2fs)
On Fri, Dec 26, 2014 at 12:59:05PM -0800, Jaegeuk Kim wrote: I fixed the merged patch directly. Changman, The patch was initially made by you, so let me know, if you have objection. Thanks, Sorry for my mistake. Thanks, Stephen and Jaegeuk. On Fri, Dec 26, 2014 at 11:17:15AM -0800, Randy Dunlap wrote: On 12/26/14 00:30, Stephen Rothwell wrote: Hi all, There will only be intermittent releases of linux-next between now and Jan 5. Changes since 20141221: on x86_64: when CONFIG_F2FS_STAT_FS is not enabled: ../fs/f2fs/segment.c: In function 'rewrite_data_page': ../fs/f2fs/segment.c:1233:2: error: implicit declaration of function 'stat_inc_inplace_blocks' [-Werror=implicit-function-declaration] -- ~Randy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
On Mon, Dec 22, 2014 at 11:36:09PM -0800, Jaegeuk Kim wrote: > Hi Chao, > > On Tue, Dec 23, 2014 at 11:01:39AM +0800, Chao Yu wrote: > > Hi Jaegeuk, > > > > > -Original Message- > > > From: Jaegeuk Kim [mailto:jaeg...@kernel.org] > > > Sent: Tuesday, December 23, 2014 7:16 AM > > > To: Chao Yu > > > Cc: 'Changman Lee'; linux-f2fs-de...@lists.sourceforge.net; > > > linux-kernel@vger.kernel.org > > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree > > > > > > Hi Chao, > > > > > > On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote: > > > > Hi Changman, > > > > > > > > > -Original Message- > > > > > From: Changman Lee [mailto:cm224@samsung.com] > > > > > Sent: Monday, December 22, 2014 10:03 AM > > > > > To: Chao Yu > > > > > Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; > > > > > linux-kernel@vger.kernel.org > > > > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree > > > > > > > > > > Hi Yu, > > > > > > > > > > Good approach. > > > > > > > > Thank you. :) > > > > > > > > > As you know, however, f2fs breaks extent itself due to COW. > > > > > > > > Yes, and sometimes f2fs use IPU when override writing, in this > > > > condition, > > > > by using this approach we can cache more contiguous mapping extent for > > > > better > > > > performance. > > > > > > Hmm. When f2fs faces with this case, there is no chance to make an extent > > > itself > > > at all. > > > > With new implementation of this patch f2fs will build extent cache when > > readpage/readpages. > > I don't understand your points exactly. :( > If there are no on-disk extents, it doesn't matter when the caches are built. > Could you define what scenarios you're looking at? > > > > > > > > > > > > > > > Unlike other filesystem like btrfs, minimum extent of f2fs could have > > > > > 4KB granularity. > > > > > So we would have lots of extents per inode and it could lead to > > > > > overhead > > > > > to manage extents. > > > > > > > > Agree, the more number of extents are growing in one inode, the more > > > > memory > > > > pressure and longer latency operating in rb-tree we are facing. > > > > IMO, to solve this problem, we'd better to add limitation or shrink > > > > ability into > > > > extent cache: > > > > 1.limit extent number per inode with the value set from sysfs and > > > > discard extent > > > > from inode's extent lru list if we touch the limitation; (e.g. in FAT, > > > > max number > > > > of mapping extent per inode is fixed: 8) > > > > 2.add all extents of inodes into a global lru list, we will try to > > > > shrink this list > > > > if we're facing memory pressure. > > > > > > > > How do you think? or any better ideas are welcome. :) > > > > > > Historically, the reason that I added only one small extent cache is that > > > I > > > wanted to avoid additional data structures having any overhead in > > > critical data > > > write path. > > > > Thank you for telling me the history of original extent cache. > > > > > Instead, I intended to use a well operating node page cache. > > > > > > We need to consider what would be the benefit when using extent cache > > > rather > > > than existing node page cache. > > > > IMO, node page cache belongs to system level cache, filesystem sub system > > can > > not control it completely, cached uptodate node page will be invalidated by > > using drop_caches from sysfs, or reclaimer of mm, result in more IO when we > > need > > these node page next time. > > Yes, that's exactly what I wanted. > > > New extent cache belongs to filesystem level cache, it is completely > > controlled > > by filesystem itself. What we can profit is: on the one hand, it is used as > > first level cache above the node page cache, which can also increase the > > cache > > hit ratio. > > I don't think so. The hit ratio depends on the cache policy. The node page > cache is managed globally by kernel in L
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
On Mon, Dec 22, 2014 at 11:36:09PM -0800, Jaegeuk Kim wrote: Hi Chao, On Tue, Dec 23, 2014 at 11:01:39AM +0800, Chao Yu wrote: Hi Jaegeuk, -Original Message- From: Jaegeuk Kim [mailto:jaeg...@kernel.org] Sent: Tuesday, December 23, 2014 7:16 AM To: Chao Yu Cc: 'Changman Lee'; linux-f2fs-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi Chao, On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Monday, December 22, 2014 10:03 AM To: Chao Yu Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi Yu, Good approach. Thank you. :) As you know, however, f2fs breaks extent itself due to COW. Yes, and sometimes f2fs use IPU when override writing, in this condition, by using this approach we can cache more contiguous mapping extent for better performance. Hmm. When f2fs faces with this case, there is no chance to make an extent itself at all. With new implementation of this patch f2fs will build extent cache when readpage/readpages. I don't understand your points exactly. :( If there are no on-disk extents, it doesn't matter when the caches are built. Could you define what scenarios you're looking at? Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB granularity. So we would have lots of extents per inode and it could lead to overhead to manage extents. Agree, the more number of extents are growing in one inode, the more memory pressure and longer latency operating in rb-tree we are facing. IMO, to solve this problem, we'd better to add limitation or shrink ability into extent cache: 1.limit extent number per inode with the value set from sysfs and discard extent from inode's extent lru list if we touch the limitation; (e.g. in FAT, max number of mapping extent per inode is fixed: 8) 2.add all extents of inodes into a global lru list, we will try to shrink this list if we're facing memory pressure. How do you think? or any better ideas are welcome. :) Historically, the reason that I added only one small extent cache is that I wanted to avoid additional data structures having any overhead in critical data write path. Thank you for telling me the history of original extent cache. Instead, I intended to use a well operating node page cache. We need to consider what would be the benefit when using extent cache rather than existing node page cache. IMO, node page cache belongs to system level cache, filesystem sub system can not control it completely, cached uptodate node page will be invalidated by using drop_caches from sysfs, or reclaimer of mm, result in more IO when we need these node page next time. Yes, that's exactly what I wanted. New extent cache belongs to filesystem level cache, it is completely controlled by filesystem itself. What we can profit is: on the one hand, it is used as first level cache above the node page cache, which can also increase the cache hit ratio. I don't think so. The hit ratio depends on the cache policy. The node page cache is managed globally by kernel in LRU manner, so I think this can show affordable hit ratio. On the other hand, it is more instable and controllable than node page cache. It depends on how you can control the extent cache. But, I'm not sure that would be better than page cache managed by MM. So, my concerns are: 1. Redundant memory overhead : The extent cache is likely on top of the node page cache, which will consume memory redundantly. 2. CPU overhead : In every block address updates, it needs to traverse extent cache entries. 3. Effectiveness : We have a node page cache that is managed by MM in LRU order. I think this provides good hit ratio, system-wide memory relciaming algorithms, and well- defined locking mechanism. 4. Cache reclaiming policy a. global approach: it needs to consider lock contention, CPU overhead, and shrinker. I don't think it is better than page cache. b. local approach: there still exists cold misses at the initial read operations. After then, how does the extent cache increase hit ratio more than giving node page cache? For example, in the case of pretty normal scenario like open - read - close - open - read ..., we can't get benefits form locally-managed extent cache, while node page
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi, On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote: > Hi Changman, > > > -Original Message- > > From: Changman Lee [mailto:cm224@samsung.com] > > Sent: Monday, December 22, 2014 10:03 AM > > To: Chao Yu > > Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; > > linux-kernel@vger.kernel.org > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree > > > > Hi Yu, > > > > Good approach. > > Thank you. :) > > > As you know, however, f2fs breaks extent itself due to COW. > > Yes, and sometimes f2fs use IPU when override writing, in this condition, > by using this approach we can cache more contiguous mapping extent for better > performance. > > > Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB > > granularity. > > So we would have lots of extents per inode and it could lead to overhead > > to manage extents. > > Agree, the more number of extents are growing in one inode, the more memory > pressure and longer latency operating in rb-tree we are facing. > IMO, to solve this problem, we'd better to add limitation or shrink ability > into > extent cache: > 1.limit extent number per inode with the value set from sysfs and discard > extent > from inode's extent lru list if we touch the limitation; (e.g. in FAT, max > number > of mapping extent per inode is fixed: 8) > 2.add all extents of inodes into a global lru list, we will try to shrink > this list > if we're facing memory pressure. > > How do you think? or any better ideas are welcome. :) > I think both of them are considerable options. How about adding extent to inode selected by user using ioctl or xattr? In the case of read most files having large size, user could get a benefit surely although they are seperated some pieces. Thanks, > > > > Anyway, mount option could be alternative for this patch. > > Yes, will do. > > Thanks, > Yu > > > > > On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote: > > > Now f2fs have page-block mapping cache which can cache only one extent > > > mapping > > > between contiguous logical address and physical address. > > > Normally, this design will work well because f2fs will expand coverage > > > area of > > > the mapping extent when we write forward sequentially. But when we write > > > data > > > randomly in Out-Place-Update mode, the extent will be shorten and hardly > > > be > > > expanded for most time as following reasons: > > > 1.The short part of extent will be discarded if we break contiguous > > > mapping in > > > the middle of extent. > > > 2.The new mapping will be added into mapping cache only at head or tail > > > of the > > > extent. > > > 3.We will drop the extent cache when the extent became very fragmented. > > > 4.We will not update the extent with mapping which we get from readpages > > > or > > > readpage. > > > > > > To solve above problems, this patch adds extent cache base on rb-tree > > > like other > > > filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support > > > another > > > more effective cache between dnode page cache and disk. It will supply > > > high hit > > > ratio in the cache with fewer memory when dnode page cache are reclaimed > > > in > > > environment of low memory. > > > > > > Todo: > > > *introduce mount option for extent cache. > > > *add shrink ability for extent cache. > > > > > > Signed-off-by: Chao Yu > > > --- > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi, On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Monday, December 22, 2014 10:03 AM To: Chao Yu Cc: Jaegeuk Kim; linux-f2fs-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi Yu, Good approach. Thank you. :) As you know, however, f2fs breaks extent itself due to COW. Yes, and sometimes f2fs use IPU when override writing, in this condition, by using this approach we can cache more contiguous mapping extent for better performance. Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB granularity. So we would have lots of extents per inode and it could lead to overhead to manage extents. Agree, the more number of extents are growing in one inode, the more memory pressure and longer latency operating in rb-tree we are facing. IMO, to solve this problem, we'd better to add limitation or shrink ability into extent cache: 1.limit extent number per inode with the value set from sysfs and discard extent from inode's extent lru list if we touch the limitation; (e.g. in FAT, max number of mapping extent per inode is fixed: 8) 2.add all extents of inodes into a global lru list, we will try to shrink this list if we're facing memory pressure. How do you think? or any better ideas are welcome. :) I think both of them are considerable options. How about adding extent to inode selected by user using ioctl or xattr? In the case of read most files having large size, user could get a benefit surely although they are seperated some pieces. Thanks, Anyway, mount option could be alternative for this patch. Yes, will do. Thanks, Yu On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote: Now f2fs have page-block mapping cache which can cache only one extent mapping between contiguous logical address and physical address. Normally, this design will work well because f2fs will expand coverage area of the mapping extent when we write forward sequentially. But when we write data randomly in Out-Place-Update mode, the extent will be shorten and hardly be expanded for most time as following reasons: 1.The short part of extent will be discarded if we break contiguous mapping in the middle of extent. 2.The new mapping will be added into mapping cache only at head or tail of the extent. 3.We will drop the extent cache when the extent became very fragmented. 4.We will not update the extent with mapping which we get from readpages or readpage. To solve above problems, this patch adds extent cache base on rb-tree like other filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support another more effective cache between dnode page cache and disk. It will supply high hit ratio in the cache with fewer memory when dnode page cache are reclaimed in environment of low memory. Todo: *introduce mount option for extent cache. *add shrink ability for extent cache. Signed-off-by: Chao Yu chao2...@samsung.com --- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi Yu, Good approach. As you know, however, f2fs breaks extent itself due to COW. Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB granularity. So we would have lots of extents per inode and it could lead to overhead to manage extents. Anyway, mount option could be alternative for this patch. On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote: > Now f2fs have page-block mapping cache which can cache only one extent mapping > between contiguous logical address and physical address. > Normally, this design will work well because f2fs will expand coverage area of > the mapping extent when we write forward sequentially. But when we write data > randomly in Out-Place-Update mode, the extent will be shorten and hardly be > expanded for most time as following reasons: > 1.The short part of extent will be discarded if we break contiguous mapping in > the middle of extent. > 2.The new mapping will be added into mapping cache only at head or tail of the > extent. > 3.We will drop the extent cache when the extent became very fragmented. > 4.We will not update the extent with mapping which we get from readpages or > readpage. > > To solve above problems, this patch adds extent cache base on rb-tree like > other > filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support another > more effective cache between dnode page cache and disk. It will supply high > hit > ratio in the cache with fewer memory when dnode page cache are reclaimed in > environment of low memory. > > Todo: > *introduce mount option for extent cache. > *add shrink ability for extent cache. > > Signed-off-by: Chao Yu > --- > fs/f2fs/data.c | 348 > +--- > fs/f2fs/debug.c | 2 + > fs/f2fs/f2fs.h | 49 > fs/f2fs/inode.c | 5 +- > fs/f2fs/super.c | 11 +- > 5 files changed, 291 insertions(+), 124 deletions(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index 7ec697b..20592e2 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -24,6 +24,8 @@ > #include "segment.h" > #include > > +struct kmem_cache *extent_info_cache; > + > static void f2fs_read_end_io(struct bio *bio, int err) > { > struct bio_vec *bvec; > @@ -247,126 +249,264 @@ int f2fs_reserve_block(struct dnode_of_data *dn, > pgoff_t index) > return err; > } > > -static int check_extent_cache(struct inode *inode, pgoff_t pgofs, > - struct buffer_head *bh_result) > +static struct extent_info *__insert_extent_cache(struct inode *inode, > + unsigned int fofs, unsigned int len, u32 blk) > { > - struct f2fs_inode_info *fi = F2FS_I(inode); > - pgoff_t start_fofs, end_fofs; > - block_t start_blkaddr; > - > - if (is_inode_flag_set(fi, FI_NO_EXTENT)) > - return 0; > - > - read_lock(>ext.ext_lock); > - if (fi->ext.len == 0) { > - read_unlock(>ext.ext_lock); > - return 0; > + struct rb_root *root = _I(inode)->ei_tree.root; > + struct rb_node *p = root->rb_node; > + struct rb_node *parent = NULL; > + struct extent_info *ei; > + > + while (p) { > + parent = p; > + ei = rb_entry(parent, struct extent_info, rb_node); > + > + if (fofs < ei->fofs) > + p = p->rb_left; > + else if (fofs >= ei->fofs + ei->len) > + p = p->rb_right; > + else > + f2fs_bug_on(F2FS_I_SB(inode), 1); > } > > - stat_inc_total_hit(inode->i_sb); > + ei = kmem_cache_alloc(extent_info_cache, GFP_ATOMIC); > + ei->fofs = fofs; > + ei->blk = blk; > + ei->len = len; > + > + rb_link_node(>rb_node, parent, ); > + rb_insert_color(>rb_node, root); > + stat_inc_extent_count(inode->i_sb); > + return ei; > +} > > - start_fofs = fi->ext.fofs; > - end_fofs = fi->ext.fofs + fi->ext.len - 1; > - start_blkaddr = fi->ext.blk_addr; > +static bool __remove_extent_cache(struct inode *inode, unsigned int fofs, > + struct extent_info *cei) > +{ > + struct rb_root *root = _I(inode)->ei_tree.root; > + struct rb_node *p = root->rb_node; > + struct extent_info *ei; > > - if (pgofs >= start_fofs && pgofs <= end_fofs) { > - unsigned int blkbits = inode->i_sb->s_blocksize_bits; > - size_t count; > + while (p) { > + ei = rb_entry(p, struct extent_info, rb_node); > > - clear_buffer_new(bh_result); > - map_bh(bh_result, inode->i_sb, > - start_blkaddr + pgofs - start_fofs); > - count = end_fofs - pgofs + 1; > - if (count < (UINT_MAX >> blkbits)) > - bh_result->b_size = (count << blkbits); > + if (fofs < ei->fofs) > + p = p->rb_left; > + else if (fofs >=
Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi Yu, Good approach. As you know, however, f2fs breaks extent itself due to COW. Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB granularity. So we would have lots of extents per inode and it could lead to overhead to manage extents. Anyway, mount option could be alternative for this patch. On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote: Now f2fs have page-block mapping cache which can cache only one extent mapping between contiguous logical address and physical address. Normally, this design will work well because f2fs will expand coverage area of the mapping extent when we write forward sequentially. But when we write data randomly in Out-Place-Update mode, the extent will be shorten and hardly be expanded for most time as following reasons: 1.The short part of extent will be discarded if we break contiguous mapping in the middle of extent. 2.The new mapping will be added into mapping cache only at head or tail of the extent. 3.We will drop the extent cache when the extent became very fragmented. 4.We will not update the extent with mapping which we get from readpages or readpage. To solve above problems, this patch adds extent cache base on rb-tree like other filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support another more effective cache between dnode page cache and disk. It will supply high hit ratio in the cache with fewer memory when dnode page cache are reclaimed in environment of low memory. Todo: *introduce mount option for extent cache. *add shrink ability for extent cache. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/data.c | 348 +--- fs/f2fs/debug.c | 2 + fs/f2fs/f2fs.h | 49 fs/f2fs/inode.c | 5 +- fs/f2fs/super.c | 11 +- 5 files changed, 291 insertions(+), 124 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7ec697b..20592e2 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -24,6 +24,8 @@ #include segment.h #include trace/events/f2fs.h +struct kmem_cache *extent_info_cache; + static void f2fs_read_end_io(struct bio *bio, int err) { struct bio_vec *bvec; @@ -247,126 +249,264 @@ int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index) return err; } -static int check_extent_cache(struct inode *inode, pgoff_t pgofs, - struct buffer_head *bh_result) +static struct extent_info *__insert_extent_cache(struct inode *inode, + unsigned int fofs, unsigned int len, u32 blk) { - struct f2fs_inode_info *fi = F2FS_I(inode); - pgoff_t start_fofs, end_fofs; - block_t start_blkaddr; - - if (is_inode_flag_set(fi, FI_NO_EXTENT)) - return 0; - - read_lock(fi-ext.ext_lock); - if (fi-ext.len == 0) { - read_unlock(fi-ext.ext_lock); - return 0; + struct rb_root *root = F2FS_I(inode)-ei_tree.root; + struct rb_node *p = root-rb_node; + struct rb_node *parent = NULL; + struct extent_info *ei; + + while (p) { + parent = p; + ei = rb_entry(parent, struct extent_info, rb_node); + + if (fofs ei-fofs) + p = p-rb_left; + else if (fofs = ei-fofs + ei-len) + p = p-rb_right; + else + f2fs_bug_on(F2FS_I_SB(inode), 1); } - stat_inc_total_hit(inode-i_sb); + ei = kmem_cache_alloc(extent_info_cache, GFP_ATOMIC); + ei-fofs = fofs; + ei-blk = blk; + ei-len = len; + + rb_link_node(ei-rb_node, parent, p); + rb_insert_color(ei-rb_node, root); + stat_inc_extent_count(inode-i_sb); + return ei; +} - start_fofs = fi-ext.fofs; - end_fofs = fi-ext.fofs + fi-ext.len - 1; - start_blkaddr = fi-ext.blk_addr; +static bool __remove_extent_cache(struct inode *inode, unsigned int fofs, + struct extent_info *cei) +{ + struct rb_root *root = F2FS_I(inode)-ei_tree.root; + struct rb_node *p = root-rb_node; + struct extent_info *ei; - if (pgofs = start_fofs pgofs = end_fofs) { - unsigned int blkbits = inode-i_sb-s_blocksize_bits; - size_t count; + while (p) { + ei = rb_entry(p, struct extent_info, rb_node); - clear_buffer_new(bh_result); - map_bh(bh_result, inode-i_sb, - start_blkaddr + pgofs - start_fofs); - count = end_fofs - pgofs + 1; - if (count (UINT_MAX blkbits)) - bh_result-b_size = (count blkbits); + if (fofs ei-fofs) + p = p-rb_left; + else if (fofs = ei-fofs + ei-len) + p = p-rb_right; else -
Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost
On Thu, Dec 18, 2014 at 02:29:51PM +0800, Chao Yu wrote: > Hi Changman, > > > -Original Message- > > From: Changman Lee [mailto:cm224@gmail.com] > > Sent: Wednesday, December 17, 2014 11:09 PM > > To: Chao Yu > > Cc: Jaegeuk Kim; Changman Lee; linux-fsde...@vger.kernel.org; > > linux-kernel@vger.kernel.org; > > linux-f2fs-de...@lists.sourceforge.net > > Subject: Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct > > node_info to reduce > > memory cost > > > > Hi Yu, > > > > This patch is effective only in 32 bit machine. In case of 64 bit > > machine, nat_entry will be aligned in 8 bytes due to pointer variable > > (i.e. struct list_head). So it can't get any benefit to reduce memory > > usage. In the case of node_info, however, it will be gain in terms of > > memory usage. > > Hence, I think it's not correct for commit log to describe this patch. > > > > Thanks for your review! :) > > AFFIK, in 64 bit machine, size of struct nat_entry is 40 bytes before this > patch > apply, the reason is that our compiler will fill 3 bytes pads after flag as > nid's offset should align to type size of nid, and then fill 7 byte pads after > version as size of structure should align to 64 bits when the struct size is > bigger > than 64 bits. > layout of struct nat_entry: > |-8 bytes-| > |list.next| > |list.prev| > |flag|nid | > |ino |blk_addr| > |version | > After we apply this patch, size of struct nat_entry will be reduced to 32 > bytes. > Please correct me if I'm wrong. Hi, Sorry, you're right. I miscalculated. Thanks, > > Anyway, I agreed that commit log should be uptodate. > > Thanks, > Yu > > > Thanks, > > > > Reviewed-by: Changman Lee > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost
On Thu, Dec 18, 2014 at 02:29:51PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@gmail.com] Sent: Wednesday, December 17, 2014 11:09 PM To: Chao Yu Cc: Jaegeuk Kim; Changman Lee; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost Hi Yu, This patch is effective only in 32 bit machine. In case of 64 bit machine, nat_entry will be aligned in 8 bytes due to pointer variable (i.e. struct list_head). So it can't get any benefit to reduce memory usage. In the case of node_info, however, it will be gain in terms of memory usage. Hence, I think it's not correct for commit log to describe this patch. Thanks for your review! :) AFFIK, in 64 bit machine, size of struct nat_entry is 40 bytes before this patch apply, the reason is that our compiler will fill 3 bytes pads after flag as nid's offset should align to type size of nid, and then fill 7 byte pads after version as size of structure should align to 64 bits when the struct size is bigger than 64 bits. layout of struct nat_entry: |-8 bytes-| |list.next| |list.prev| |flag|nid | |ino |blk_addr| |version | After we apply this patch, size of struct nat_entry will be reduced to 32 bytes. Please correct me if I'm wrong. Hi, Sorry, you're right. I miscalculated. Thanks, Anyway, I agreed that commit log should be uptodate. Thanks, Yu Thanks, Reviewed-by: Changman Lee cm224@samsung.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH v2] f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary
Hi, Is there any reason to use truncate_inode_pages_range instead of invalidate_mapping_pages? IMHO, it seems nice to just use invalidate_mapping_pages because pages of meta_inode shouldn't be dirty, locked, under writeback or mapped in this function. If there is my misunderstanding, let me know. Thanks, Reviewed-by: Changman Lee 2014-12-17 19:10 GMT+09:00 Chao Yu : > Use more common function ra_meta_pages() with META_POR to readahead node > blocks > in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the > readahead code there, and also we can remove unused function ra_sum_pages(). > > changes from v1: > o fix one bug when using truncate_inode_pages_range which is pointed out by >Jaegeuk Kim. > > Signed-off-by: Chao Yu > --- > fs/f2fs/node.c | 68 > +- > 1 file changed, 15 insertions(+), 53 deletions(-) > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 5aa54a0..ab48b4c 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1726,80 +1726,42 @@ int recover_inode_page(struct f2fs_sb_info *sbi, > struct page *page) > return 0; > } > > -/* > - * ra_sum_pages() merge contiguous pages into one bio and submit. > - * these pre-read pages are allocated in bd_inode's mapping tree. > - */ > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, > - int start, int nrpages) > -{ > - struct inode *inode = sbi->sb->s_bdev->bd_inode; > - struct address_space *mapping = inode->i_mapping; > - int i, page_idx = start; > - struct f2fs_io_info fio = { > - .type = META, > - .rw = READ_SYNC | REQ_META | REQ_PRIO > - }; > - > - for (i = 0; page_idx < start + nrpages; page_idx++, i++) { > - /* alloc page in bd_inode for reading node summary info */ > - pages[i] = grab_cache_page(mapping, page_idx); > - if (!pages[i]) > - break; > - f2fs_submit_page_mbio(sbi, pages[i], page_idx, ); > - } > - > - f2fs_submit_merged_bio(sbi, META, READ); > - return i; > -} > - > int restore_node_summary(struct f2fs_sb_info *sbi, > unsigned int segno, struct f2fs_summary_block *sum) > { > struct f2fs_node *rn; > struct f2fs_summary *sum_entry; > - struct inode *inode = sbi->sb->s_bdev->bd_inode; > block_t addr; > int bio_blocks = MAX_BIO_BLOCKS(sbi); > - struct page *pages[bio_blocks]; > - int i, idx, last_offset, nrpages, err = 0; > + int i, idx, last_offset, nrpages; > > /* scan the node segment */ > last_offset = sbi->blocks_per_seg; > addr = START_BLOCK(sbi, segno); > sum_entry = >entries[0]; > > - for (i = 0; !err && i < last_offset; i += nrpages, addr += nrpages) { > + for (i = 0; i < last_offset; i += nrpages, addr += nrpages) { > nrpages = min(last_offset - i, bio_blocks); > > /* readahead node pages */ > - nrpages = ra_sum_pages(sbi, pages, addr, nrpages); > - if (!nrpages) > - return -ENOMEM; > + ra_meta_pages(sbi, addr, nrpages, META_POR); > > - for (idx = 0; idx < nrpages; idx++) { > - if (err) > - goto skip; > + for (idx = addr; idx < addr + nrpages; idx++) { > + struct page *page = get_meta_page(sbi, idx); > > - lock_page(pages[idx]); > - if (unlikely(!PageUptodate(pages[idx]))) { > - err = -EIO; > - } else { > - rn = F2FS_NODE(pages[idx]); > - sum_entry->nid = rn->footer.nid; > - sum_entry->version = 0; > - sum_entry->ofs_in_node = 0; > - sum_entry++; > - } > - unlock_page(pages[idx]); > -skip: > - page_cache_release(pages[idx]); > + rn = F2FS_NODE(page); > + sum_entry->nid = rn->footer.nid; > + sum_entry->version = 0; > + sum_entry->ofs_in_node = 0; > + sum_entry++; > + f2fs_put_page(page, 1); > } > > - invalidate_mapping_pages(inode->i_map
Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost
Hi Yu, This patch is effective only in 32 bit machine. In case of 64 bit machine, nat_entry will be aligned in 8 bytes due to pointer variable (i.e. struct list_head). So it can't get any benefit to reduce memory usage. In the case of node_info, however, it will be gain in terms of memory usage. Hence, I think it's not correct for commit log to describe this patch. Thanks, Reviewed-by: Changman Lee 2014-12-15 18:33 GMT+09:00 Chao Yu : > This patch moves one member of struct nat_entry: _flag_ to struct node_info, > so _version_ in struct node_info and _flag_ with unsigned char type will merge > to one 32-bit space in register/memory. Then the size of nat_entry will reduce > its size from 28 bytes to 24 bytes and slab memory using by f2fs will be > reduced. > > changes from v1: > o introduce inline copy_node_info() to copy valid data from node info > suggested >by Jaegeuk Kim, it can avoid bug. > > Signed-off-by: Chao Yu > --- > fs/f2fs/node.c | 4 ++-- > fs/f2fs/node.h | 33 ++--- > 2 files changed, 24 insertions(+), 13 deletions(-) > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index f83326c..5aa54a0 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -268,7 +268,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, > struct node_info *ni, > e = __lookup_nat_cache(nm_i, ni->nid); > if (!e) { > e = grab_nat_entry(nm_i, ni->nid); > - e->ni = *ni; > + copy_node_info(>ni, ni); > f2fs_bug_on(sbi, ni->blk_addr == NEW_ADDR); > } else if (new_blkaddr == NEW_ADDR) { > /* > @@ -276,7 +276,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, > struct node_info *ni, > * previous nat entry can be remained in nat cache. > * So, reinitialize it with new information. > */ > - e->ni = *ni; > + copy_node_info(>ni, ni); > f2fs_bug_on(sbi, ni->blk_addr != NULL_ADDR); > } > > diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h > index d10b644..eb59167 100644 > --- a/fs/f2fs/node.h > +++ b/fs/f2fs/node.h > @@ -29,6 +29,14 @@ > /* return value for read_node_page */ > #define LOCKED_PAGE1 > > +/* For flag in struct node_info */ > +enum { > + IS_CHECKPOINTED,/* is it checkpointed before? */ > + HAS_FSYNCED_INODE, /* is the inode fsynced before? */ > + HAS_LAST_FSYNC, /* has the latest node fsync mark? */ > + IS_DIRTY, /* this nat entry is dirty? */ > +}; > + > /* > * For node information > */ > @@ -37,18 +45,11 @@ struct node_info { > nid_t ino; /* inode number of the node's owner */ > block_t blk_addr; /* block address of the node */ > unsigned char version; /* version of the node */ > -}; > - > -enum { > - IS_CHECKPOINTED,/* is it checkpointed before? */ > - HAS_FSYNCED_INODE, /* is the inode fsynced before? */ > - HAS_LAST_FSYNC, /* has the latest node fsync mark? */ > - IS_DIRTY, /* this nat entry is dirty? */ > + unsigned char flag; /* for node information bits */ > }; > > struct nat_entry { > struct list_head list; /* for clean or dirty nat list */ > - unsigned char flag; /* for node information bits */ > struct node_info ni;/* in-memory node information */ > }; > > @@ -63,20 +64,30 @@ struct nat_entry { > > #define inc_node_version(version) (++version) > > +static inline void copy_node_info(struct node_info *dst, > + struct node_info *src) > +{ > + dst->nid = src->nid; > + dst->ino = src->ino; > + dst->blk_addr = src->blk_addr; > + dst->version = src->version; > + /* should not copy flag here */ > +} > + > static inline void set_nat_flag(struct nat_entry *ne, > unsigned int type, bool set) > { > unsigned char mask = 0x01 << type; > if (set) > - ne->flag |= mask; > + ne->ni.flag |= mask; > else > - ne->flag &= ~mask; > + ne->ni.flag &= ~mask; > } > > static inline bool get_nat_flag(struct nat_entry *ne, unsigned int type) > { > unsigned char mask = 0x01 << type; > - return ne->flag & mask; > + return ne->ni.flag & mask; > } > > static inline void nat_reset_flag(struct nat_entry *ne) > -- > 2.1.2 &g
Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost
Hi Yu, This patch is effective only in 32 bit machine. In case of 64 bit machine, nat_entry will be aligned in 8 bytes due to pointer variable (i.e. struct list_head). So it can't get any benefit to reduce memory usage. In the case of node_info, however, it will be gain in terms of memory usage. Hence, I think it's not correct for commit log to describe this patch. Thanks, Reviewed-by: Changman Lee cm224@samsung.com 2014-12-15 18:33 GMT+09:00 Chao Yu chao2...@samsung.com: This patch moves one member of struct nat_entry: _flag_ to struct node_info, so _version_ in struct node_info and _flag_ with unsigned char type will merge to one 32-bit space in register/memory. Then the size of nat_entry will reduce its size from 28 bytes to 24 bytes and slab memory using by f2fs will be reduced. changes from v1: o introduce inline copy_node_info() to copy valid data from node info suggested by Jaegeuk Kim, it can avoid bug. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 4 ++-- fs/f2fs/node.h | 33 ++--- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index f83326c..5aa54a0 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -268,7 +268,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, e = __lookup_nat_cache(nm_i, ni-nid); if (!e) { e = grab_nat_entry(nm_i, ni-nid); - e-ni = *ni; + copy_node_info(e-ni, ni); f2fs_bug_on(sbi, ni-blk_addr == NEW_ADDR); } else if (new_blkaddr == NEW_ADDR) { /* @@ -276,7 +276,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, * previous nat entry can be remained in nat cache. * So, reinitialize it with new information. */ - e-ni = *ni; + copy_node_info(e-ni, ni); f2fs_bug_on(sbi, ni-blk_addr != NULL_ADDR); } diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h index d10b644..eb59167 100644 --- a/fs/f2fs/node.h +++ b/fs/f2fs/node.h @@ -29,6 +29,14 @@ /* return value for read_node_page */ #define LOCKED_PAGE1 +/* For flag in struct node_info */ +enum { + IS_CHECKPOINTED,/* is it checkpointed before? */ + HAS_FSYNCED_INODE, /* is the inode fsynced before? */ + HAS_LAST_FSYNC, /* has the latest node fsync mark? */ + IS_DIRTY, /* this nat entry is dirty? */ +}; + /* * For node information */ @@ -37,18 +45,11 @@ struct node_info { nid_t ino; /* inode number of the node's owner */ block_t blk_addr; /* block address of the node */ unsigned char version; /* version of the node */ -}; - -enum { - IS_CHECKPOINTED,/* is it checkpointed before? */ - HAS_FSYNCED_INODE, /* is the inode fsynced before? */ - HAS_LAST_FSYNC, /* has the latest node fsync mark? */ - IS_DIRTY, /* this nat entry is dirty? */ + unsigned char flag; /* for node information bits */ }; struct nat_entry { struct list_head list; /* for clean or dirty nat list */ - unsigned char flag; /* for node information bits */ struct node_info ni;/* in-memory node information */ }; @@ -63,20 +64,30 @@ struct nat_entry { #define inc_node_version(version) (++version) +static inline void copy_node_info(struct node_info *dst, + struct node_info *src) +{ + dst-nid = src-nid; + dst-ino = src-ino; + dst-blk_addr = src-blk_addr; + dst-version = src-version; + /* should not copy flag here */ +} + static inline void set_nat_flag(struct nat_entry *ne, unsigned int type, bool set) { unsigned char mask = 0x01 type; if (set) - ne-flag |= mask; + ne-ni.flag |= mask; else - ne-flag = ~mask; + ne-ni.flag = ~mask; } static inline bool get_nat_flag(struct nat_entry *ne, unsigned int type) { unsigned char mask = 0x01 type; - return ne-flag mask; + return ne-ni.flag mask; } static inline void nat_reset_flag(struct nat_entry *ne) -- 2.1.2 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list linux
Re: [f2fs-dev] [PATCH v2] f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary
Hi, Is there any reason to use truncate_inode_pages_range instead of invalidate_mapping_pages? IMHO, it seems nice to just use invalidate_mapping_pages because pages of meta_inode shouldn't be dirty, locked, under writeback or mapped in this function. If there is my misunderstanding, let me know. Thanks, Reviewed-by: Changman Lee cm224@samsung.com 2014-12-17 19:10 GMT+09:00 Chao Yu chao2...@samsung.com: Use more common function ra_meta_pages() with META_POR to readahead node blocks in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the readahead code there, and also we can remove unused function ra_sum_pages(). changes from v1: o fix one bug when using truncate_inode_pages_range which is pointed out by Jaegeuk Kim. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 68 +- 1 file changed, 15 insertions(+), 53 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 5aa54a0..ab48b4c 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1726,80 +1726,42 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) return 0; } -/* - * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-read pages are allocated in bd_inode's mapping tree. - */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, - int start, int nrpages) -{ - struct inode *inode = sbi-sb-s_bdev-bd_inode; - struct address_space *mapping = inode-i_mapping; - int i, page_idx = start; - struct f2fs_io_info fio = { - .type = META, - .rw = READ_SYNC | REQ_META | REQ_PRIO - }; - - for (i = 0; page_idx start + nrpages; page_idx++, i++) { - /* alloc page in bd_inode for reading node summary info */ - pages[i] = grab_cache_page(mapping, page_idx); - if (!pages[i]) - break; - f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio); - } - - f2fs_submit_merged_bio(sbi, META, READ); - return i; -} - int restore_node_summary(struct f2fs_sb_info *sbi, unsigned int segno, struct f2fs_summary_block *sum) { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(sbi); - struct page *pages[bio_blocks]; - int i, idx, last_offset, nrpages, err = 0; + int i, idx, last_offset, nrpages; /* scan the node segment */ last_offset = sbi-blocks_per_seg; addr = START_BLOCK(sbi, segno); sum_entry = sum-entries[0]; - for (i = 0; !err i last_offset; i += nrpages, addr += nrpages) { + for (i = 0; i last_offset; i += nrpages, addr += nrpages) { nrpages = min(last_offset - i, bio_blocks); /* readahead node pages */ - nrpages = ra_sum_pages(sbi, pages, addr, nrpages); - if (!nrpages) - return -ENOMEM; + ra_meta_pages(sbi, addr, nrpages, META_POR); - for (idx = 0; idx nrpages; idx++) { - if (err) - goto skip; + for (idx = addr; idx addr + nrpages; idx++) { + struct page *page = get_meta_page(sbi, idx); - lock_page(pages[idx]); - if (unlikely(!PageUptodate(pages[idx]))) { - err = -EIO; - } else { - rn = F2FS_NODE(pages[idx]); - sum_entry-nid = rn-footer.nid; - sum_entry-version = 0; - sum_entry-ofs_in_node = 0; - sum_entry++; - } - unlock_page(pages[idx]); -skip: - page_cache_release(pages[idx]); + rn = F2FS_NODE(page); + sum_entry-nid = rn-footer.nid; + sum_entry-version = 0; + sum_entry-ofs_in_node = 0; + sum_entry++; + f2fs_put_page(page, 1); } - invalidate_mapping_pages(inode-i_mapping, addr, - addr + nrpages); + truncate_inode_pages_range(META_MAPPING(sbi), + addr PAGE_CACHE_SHIFT, + ((addr + nrpages) PAGE_CACHE_SHIFT) - 1); } - return err; + return 0; } static void remove_nats_in_journal(struct f2fs_sb_info *sbi) -- 2.1.2
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Simon, Thanks very much for your interest. It becomes more clear due to your explanation. Regards, Changman On Tue, Nov 25, 2014 at 08:05:23PM +0100, Simon Baatz wrote: > Hi Changman, > > On Mon, Nov 24, 2014 at 11:46:46AM +0900, Changman Lee wrote: > > Hi Simon, > > Thanks for your explanation kindly. > > > > On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote: > > > Hi Changman, Jaegeuk, > > > > > > On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote: > > > > On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: > > > > > On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: > > > > > > Hi Jaegeuk, > > > > > > > > > > > > We should call flush_dcache_page before kunmap because the purpose > > > > > > of the cache flush is to address aliasing problem related to > > > > > > virtual address. > > > > > > > > > > Oh, I just followed zero_user_segments below. > > > > > > > > > > static inline void zero_user_segments(struct page *page, > > > > > unsigned start1, unsigned end1, > > > > > unsigned start2, unsigned end2) > > > > > { > > > > > void *kaddr = kmap_atomic(page); > > > > > > > > > > BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE); > > > > > > > > > > if (end1 > start1) > > > > > memset(kaddr + start1, 0, end1 - start1); > > > > > > > > > > if (end2 > start2) > > > > > memset(kaddr + start2, 0, end2 - start2); > > > > > > > > > > kunmap_atomic(kaddr); > > > > > flush_dcache_page(page); > > > > > } > > > > > > > > > > Is this a wrong reference? Or, a bug? > > > > > > > > > > > > > Well.. Data in cache only have to be flushed until before other users > > > > read the data. > > > > If so, it's not a bug. > > > > > > > > > > Yes, it is not a bug, since flush_dcache_page() needs to be able to > > > deal with non-kmapped pages. However, this may create overhead in > > > some situations. > > > > > > > Previously, I was vague but I thought that it should be different > > according to vaddr exists or not. So I told jaegeuk that it should > > be better to change an order between flush_dache_page and kunmap. > > But actually, it doesn't matter the order between them except > > the situation you said. > > Could you explain the situation that makes overhead by flushing after > > kummap. > > I can't imagine it by just seeing flush_dcache_page code. > > > > I was a not very precise here. Yes, flush_dcache_page() on ARM does > the same in both situations since it has no idea whether it is called > before or after kunmap. However, flush_kernel_dcache_page() can > assume that it is called before kunmap and thus, for example, does not > need to pin a highmem page by kmap_high_get() (apart from not having > to care about flushing user space mappings) > > > > According to documentation (see Documentation/cachetlb.txt), this is > > > a use for flush_kernel_dcache_page(), since the page has been > > > modified by the kernel only. In contrast to flush_dcache_page(), > > > this function must be called before kunmap(). > > > > > > flush_kernel_dcache_page() does not need to flush the user space > > > aliases. Additionally, at least on ARM, it does not flush at all > > > when called within kmap_atomic()/kunmap_atomic(), when > > > kunmap_atomic() is going to flush the page anyway. (I know that > > > almost no one uses flush_kernel_dcache_page() (probably because > > > almost no one knows when to use which of the two functions), but it > > > may save a few cache flushes on architectures which are affected by > > > aliasing) > > > > > > > > > > > Anyway I modified as below. > > > > > > > > > > Thanks, > > > > > > > > > > >From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 > > > > > >2001 > > > > > From: Jaegeuk Kim > > > > > Date: Tue, 18 Nov 2014 10:50:21 -0800 > > > > > Subject: [PATCH] f2fs: call flush_dcache_page when the page was > > > > > updated > > > > >
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Simon, Thanks very much for your interest. It becomes more clear due to your explanation. Regards, Changman On Tue, Nov 25, 2014 at 08:05:23PM +0100, Simon Baatz wrote: Hi Changman, On Mon, Nov 24, 2014 at 11:46:46AM +0900, Changman Lee wrote: Hi Simon, Thanks for your explanation kindly. On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote: Hi Changman, Jaegeuk, On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote: On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. Oh, I just followed zero_user_segments below. static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2) { void *kaddr = kmap_atomic(page); BUG_ON(end1 PAGE_SIZE || end2 PAGE_SIZE); if (end1 start1) memset(kaddr + start1, 0, end1 - start1); if (end2 start2) memset(kaddr + start2, 0, end2 - start2); kunmap_atomic(kaddr); flush_dcache_page(page); } Is this a wrong reference? Or, a bug? Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. Yes, it is not a bug, since flush_dcache_page() needs to be able to deal with non-kmapped pages. However, this may create overhead in some situations. Previously, I was vague but I thought that it should be different according to vaddr exists or not. So I told jaegeuk that it should be better to change an order between flush_dache_page and kunmap. But actually, it doesn't matter the order between them except the situation you said. Could you explain the situation that makes overhead by flushing after kummap. I can't imagine it by just seeing flush_dcache_page code. I was a not very precise here. Yes, flush_dcache_page() on ARM does the same in both situations since it has no idea whether it is called before or after kunmap. However, flush_kernel_dcache_page() can assume that it is called before kunmap and thus, for example, does not need to pin a highmem page by kmap_high_get() (apart from not having to care about flushing user space mappings) According to documentation (see Documentation/cachetlb.txt), this is a use for flush_kernel_dcache_page(), since the page has been modified by the kernel only. In contrast to flush_dcache_page(), this function must be called before kunmap(). flush_kernel_dcache_page() does not need to flush the user space aliases. Additionally, at least on ARM, it does not flush at all when called within kmap_atomic()/kunmap_atomic(), when kunmap_atomic() is going to flush the page anyway. (I know that almost no one uses flush_kernel_dcache_page() (probably because almost no one knows when to use which of the two functions), but it may save a few cache flushes on architectures which are affected by aliasing) Anyway I modified as below. Thanks, From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaeg...@kernel.org Date: Tue, 18 Nov 2014 10:50:21 -0800 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..fabf4ee 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { + flush_dcache_page(page); kunmap(page); + } Is this a page that may be mapped into user space? (I may be completely wrong here, since I have no idea how this code works. But it looks like as if the answer is no ;-) ). It is not necessary to flush pages that cannot be seen by user space (see also the NOTE in the documentation of flush_dcache_page() in cachetlb.txt). Thus, if you know that a page will not be mapped into user space, please don't create the overhead of flushing it. In the case of dentry unlike inline data
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Simon, Thanks for your explanation kindly. On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote: > Hi Changman, Jaegeuk, > > On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote: > > On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: > > > On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: > > > > Hi Jaegeuk, > > > > > > > > We should call flush_dcache_page before kunmap because the purpose of > > > > the cache flush is to address aliasing problem related to virtual > > > > address. > > > > > > Oh, I just followed zero_user_segments below. > > > > > > static inline void zero_user_segments(struct page *page, > > > unsigned start1, unsigned end1, > > > unsigned start2, unsigned end2) > > > { > > > void *kaddr = kmap_atomic(page); > > > > > > BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE); > > > > > > if (end1 > start1) > > > memset(kaddr + start1, 0, end1 - start1); > > > > > > if (end2 > start2) > > > memset(kaddr + start2, 0, end2 - start2); > > > > > > kunmap_atomic(kaddr); > > > flush_dcache_page(page); > > > } > > > > > > Is this a wrong reference? Or, a bug? > > > > > > > Well.. Data in cache only have to be flushed until before other users read > > the data. > > If so, it's not a bug. > > > > Yes, it is not a bug, since flush_dcache_page() needs to be able to > deal with non-kmapped pages. However, this may create overhead in > some situations. > Previously, I was vague but I thought that it should be different according to vaddr exists or not. So I told jaegeuk that it should be better to change an order between flush_dache_page and kunmap. But actually, it doesn't matter the order between them except the situation you said. Could you explain the situation that makes overhead by flushing after kummap. I can't imagine it by just seeing flush_dcache_page code. > According to documentation (see Documentation/cachetlb.txt), this is > a use for flush_kernel_dcache_page(), since the page has been > modified by the kernel only. In contrast to flush_dcache_page(), > this function must be called before kunmap(). > > flush_kernel_dcache_page() does not need to flush the user space > aliases. Additionally, at least on ARM, it does not flush at all > when called within kmap_atomic()/kunmap_atomic(), when > kunmap_atomic() is going to flush the page anyway. (I know that > almost no one uses flush_kernel_dcache_page() (probably because > almost no one knows when to use which of the two functions), but it > may save a few cache flushes on architectures which are affected by > aliasing) > > > > > Anyway I modified as below. > > > > > > Thanks, > > > > > > >From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 > > > From: Jaegeuk Kim > > > Date: Tue, 18 Nov 2014 10:50:21 -0800 > > > Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated > > > > > > Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. > > > > > > Signed-off-by: Jaegeuk Kim > > > --- > > > fs/f2fs/dir.c| 7 ++- > > > fs/f2fs/inline.c | 2 ++ > > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > > > index 5a49995..fabf4ee 100644 > > > --- a/fs/f2fs/dir.c > > > +++ b/fs/f2fs/dir.c > > > @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct > > > f2fs_dir_entry *de, > > > f2fs_wait_on_page_writeback(page, type); > > > de->ino = cpu_to_le32(inode->i_ino); > > > set_de_type(de, inode); > > > - if (!f2fs_has_inline_dentry(dir)) > > > + if (!f2fs_has_inline_dentry(dir)) { > > > + flush_dcache_page(page); > > > kunmap(page); > > > + } > > Is this a page that may be mapped into user space? (I may be > completely wrong here, since I have no idea how this code works. But > it looks like as if the answer is "no" ;-) ). > > It is not necessary to flush pages that cannot be seen by user space > (see also the NOTE in the documentation of flush_dcache_page() in > cachetlb.txt). Thus, if you know that a page will not be mapped into > user space, please don't create the overhead of flushing it. > In the case of dentry unlike inline data, this is not mapped to user space, so dcache flush makes overhead. Do you mean that? Best regard, Changman > > - Simon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Simon, Thanks for your explanation kindly. On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote: Hi Changman, Jaegeuk, On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote: On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. Oh, I just followed zero_user_segments below. static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2) { void *kaddr = kmap_atomic(page); BUG_ON(end1 PAGE_SIZE || end2 PAGE_SIZE); if (end1 start1) memset(kaddr + start1, 0, end1 - start1); if (end2 start2) memset(kaddr + start2, 0, end2 - start2); kunmap_atomic(kaddr); flush_dcache_page(page); } Is this a wrong reference? Or, a bug? Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. Yes, it is not a bug, since flush_dcache_page() needs to be able to deal with non-kmapped pages. However, this may create overhead in some situations. Previously, I was vague but I thought that it should be different according to vaddr exists or not. So I told jaegeuk that it should be better to change an order between flush_dache_page and kunmap. But actually, it doesn't matter the order between them except the situation you said. Could you explain the situation that makes overhead by flushing after kummap. I can't imagine it by just seeing flush_dcache_page code. According to documentation (see Documentation/cachetlb.txt), this is a use for flush_kernel_dcache_page(), since the page has been modified by the kernel only. In contrast to flush_dcache_page(), this function must be called before kunmap(). flush_kernel_dcache_page() does not need to flush the user space aliases. Additionally, at least on ARM, it does not flush at all when called within kmap_atomic()/kunmap_atomic(), when kunmap_atomic() is going to flush the page anyway. (I know that almost no one uses flush_kernel_dcache_page() (probably because almost no one knows when to use which of the two functions), but it may save a few cache flushes on architectures which are affected by aliasing) Anyway I modified as below. Thanks, From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaeg...@kernel.org Date: Tue, 18 Nov 2014 10:50:21 -0800 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..fabf4ee 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { + flush_dcache_page(page); kunmap(page); + } Is this a page that may be mapped into user space? (I may be completely wrong here, since I have no idea how this code works. But it looks like as if the answer is no ;-) ). It is not necessary to flush pages that cannot be seen by user space (see also the NOTE in the documentation of flush_dcache_page() in cachetlb.txt). Thus, if you know that a page will not be mapped into user space, please don't create the overhead of flushing it. In the case of dentry unlike inline data, this is not mapped to user space, so dcache flush makes overhead. Do you mean that? Best regard, Changman - Simon -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: > On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: > > Hi Jaegeuk, > > > > We should call flush_dcache_page before kunmap because the purpose of the > > cache flush is to address aliasing problem related to virtual address. > > Oh, I just followed zero_user_segments below. > > static inline void zero_user_segments(struct page *page, > unsigned start1, unsigned end1, > unsigned start2, unsigned end2) > { > void *kaddr = kmap_atomic(page); > > BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE); > > if (end1 > start1) > memset(kaddr + start1, 0, end1 - start1); > > if (end2 > start2) > memset(kaddr + start2, 0, end2 - start2); > > kunmap_atomic(kaddr); > flush_dcache_page(page); > } > > Is this a wrong reference? Or, a bug? > Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. > Anyway I modified as below. > > Thanks, > > >From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim > Date: Tue, 18 Nov 2014 10:50:21 -0800 > Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated > > Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/dir.c| 7 ++- > fs/f2fs/inline.c | 2 ++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > index 5a49995..fabf4ee 100644 > --- a/fs/f2fs/dir.c > +++ b/fs/f2fs/dir.c > @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct > f2fs_dir_entry *de, > f2fs_wait_on_page_writeback(page, type); > de->ino = cpu_to_le32(inode->i_ino); > set_de_type(de, inode); > - if (!f2fs_has_inline_dentry(dir)) > + if (!f2fs_has_inline_dentry(dir)) { > + flush_dcache_page(page); > kunmap(page); > + } > set_page_dirty(page); > dir->i_mtime = dir->i_ctime = CURRENT_TIME; > mark_inode_dirty(dir); > @@ -365,6 +367,7 @@ static int make_empty_dir(struct inode *inode, > make_dentry_ptr(, (void *)dentry_blk, 1); > do_make_empty_dir(inode, parent, ); > > + flush_dcache_page(dentry_page); > kunmap_atomic(dentry_blk); > > set_page_dirty(dentry_page); > @@ -578,6 +581,7 @@ fail: > update_inode_page(dir); > clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR); > } > + flush_dcache_page(dentry_page); > kunmap(dentry_page); > f2fs_put_page(dentry_page, 1); > return err; > @@ -660,6 +664,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, > struct page *page, > bit_pos = find_next_bit_le(_blk->dentry_bitmap, > NR_DENTRY_IN_BLOCK, > 0); > + flush_dcache_page(page); > kunmap(page); /* kunmap - pair of f2fs_find_entry */ > set_page_dirty(page); > > diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c > index f26fb87..4291c1f 100644 > --- a/fs/f2fs/inline.c > +++ b/fs/f2fs/inline.c > @@ -106,6 +106,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, > struct page *page) > src_addr = inline_data_addr(dn->inode_page); > dst_addr = kmap_atomic(page); > memcpy(dst_addr, src_addr, MAX_INLINE_DATA); > + flush_dcache_page(page); > kunmap_atomic(dst_addr); > SetPageUptodate(page); > no_update: > @@ -357,6 +358,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, > struct page *ipage, > memcpy(dentry_blk->filename, inline_dentry->filename, > NR_INLINE_DENTRY * F2FS_SLOT_LEN); > > + flush_dcache_page(page); > kunmap_atomic(dentry_blk); > SetPageUptodate(page); > set_page_dirty(page); > -- > 2.1.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. Oh, I just followed zero_user_segments below. static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2) { void *kaddr = kmap_atomic(page); BUG_ON(end1 PAGE_SIZE || end2 PAGE_SIZE); if (end1 start1) memset(kaddr + start1, 0, end1 - start1); if (end2 start2) memset(kaddr + start2, 0, end2 - start2); kunmap_atomic(kaddr); flush_dcache_page(page); } Is this a wrong reference? Or, a bug? Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. Anyway I modified as below. Thanks, From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaeg...@kernel.org Date: Tue, 18 Nov 2014 10:50:21 -0800 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..fabf4ee 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { + flush_dcache_page(page); kunmap(page); + } set_page_dirty(page); dir-i_mtime = dir-i_ctime = CURRENT_TIME; mark_inode_dirty(dir); @@ -365,6 +367,7 @@ static int make_empty_dir(struct inode *inode, make_dentry_ptr(d, (void *)dentry_blk, 1); do_make_empty_dir(inode, parent, d); + flush_dcache_page(dentry_page); kunmap_atomic(dentry_blk); set_page_dirty(dentry_page); @@ -578,6 +581,7 @@ fail: update_inode_page(dir); clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR); } + flush_dcache_page(dentry_page); kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -660,6 +664,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(dentry_blk-dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); + flush_dcache_page(page); kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index f26fb87..4291c1f 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -106,6 +106,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) src_addr = inline_data_addr(dn-inode_page); dst_addr = kmap_atomic(page); memcpy(dst_addr, src_addr, MAX_INLINE_DATA); + flush_dcache_page(page); kunmap_atomic(dst_addr); SetPageUptodate(page); no_update: @@ -357,6 +358,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, struct page *ipage, memcpy(dentry_blk-filename, inline_dentry-filename, NR_INLINE_DENTRY * F2FS_SLOT_LEN); + flush_dcache_page(page); kunmap_atomic(dentry_blk); SetPageUptodate(page); set_page_dirty(page); -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. On Wed, Nov 19, 2014 at 02:35:08PM -0800, Jaegeuk Kim wrote: > Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/dir.c| 7 ++- > fs/f2fs/inline.c | 4 +++- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > index 5a49995..312fbfc 100644 > --- a/fs/f2fs/dir.c > +++ b/fs/f2fs/dir.c > @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct > f2fs_dir_entry *de, > f2fs_wait_on_page_writeback(page, type); > de->ino = cpu_to_le32(inode->i_ino); > set_de_type(de, inode); > - if (!f2fs_has_inline_dentry(dir)) > + if (!f2fs_has_inline_dentry(dir)) { > kunmap(page); > + flush_dcache_page(page); > + } > set_page_dirty(page); > dir->i_mtime = dir->i_ctime = CURRENT_TIME; > mark_inode_dirty(dir); > @@ -366,6 +368,7 @@ static int make_empty_dir(struct inode *inode, > do_make_empty_dir(inode, parent, ); > > kunmap_atomic(dentry_blk); > + flush_dcache_page(dentry_page); > > set_page_dirty(dentry_page); > f2fs_put_page(dentry_page, 1); > @@ -579,6 +582,7 @@ fail: > clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR); > } > kunmap(dentry_page); > + flush_dcache_page(dentry_page); > f2fs_put_page(dentry_page, 1); > return err; > } > @@ -661,6 +665,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, > struct page *page, > NR_DENTRY_IN_BLOCK, > 0); > kunmap(page); /* kunmap - pair of f2fs_find_entry */ > + flush_dcache_page(page); > set_page_dirty(page); > > dir->i_ctime = dir->i_mtime = CURRENT_TIME; > diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c > index f26fb87..8b7cc51 100644 > --- a/fs/f2fs/inline.c > +++ b/fs/f2fs/inline.c > @@ -45,8 +45,8 @@ void read_inline_data(struct page *page, struct page *ipage) > src_addr = inline_data_addr(ipage); > dst_addr = kmap_atomic(page); > memcpy(dst_addr, src_addr, MAX_INLINE_DATA); > - flush_dcache_page(page); > kunmap_atomic(dst_addr); > + flush_dcache_page(page); > SetPageUptodate(page); > } > > @@ -107,6 +107,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, > struct page *page) > dst_addr = kmap_atomic(page); > memcpy(dst_addr, src_addr, MAX_INLINE_DATA); > kunmap_atomic(dst_addr); > + flush_dcache_page(page); > SetPageUptodate(page); > no_update: > /* write data page to try to make data consistent */ > @@ -358,6 +359,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, > struct page *ipage, > NR_INLINE_DENTRY * F2FS_SLOT_LEN); > > kunmap_atomic(dentry_blk); > + flush_dcache_page(page); > SetPageUptodate(page); > set_page_dirty(page); > > -- > 2.1.1 > > > -- > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751=/4140/ostg.clktrk > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. On Wed, Nov 19, 2014 at 02:35:08PM -0800, Jaegeuk Kim wrote: Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 4 +++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..312fbfc 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { kunmap(page); + flush_dcache_page(page); + } set_page_dirty(page); dir-i_mtime = dir-i_ctime = CURRENT_TIME; mark_inode_dirty(dir); @@ -366,6 +368,7 @@ static int make_empty_dir(struct inode *inode, do_make_empty_dir(inode, parent, d); kunmap_atomic(dentry_blk); + flush_dcache_page(dentry_page); set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); @@ -579,6 +582,7 @@ fail: clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR); } kunmap(dentry_page); + flush_dcache_page(dentry_page); f2fs_put_page(dentry_page, 1); return err; } @@ -661,6 +665,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, NR_DENTRY_IN_BLOCK, 0); kunmap(page); /* kunmap - pair of f2fs_find_entry */ + flush_dcache_page(page); set_page_dirty(page); dir-i_ctime = dir-i_mtime = CURRENT_TIME; diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index f26fb87..8b7cc51 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -45,8 +45,8 @@ void read_inline_data(struct page *page, struct page *ipage) src_addr = inline_data_addr(ipage); dst_addr = kmap_atomic(page); memcpy(dst_addr, src_addr, MAX_INLINE_DATA); - flush_dcache_page(page); kunmap_atomic(dst_addr); + flush_dcache_page(page); SetPageUptodate(page); } @@ -107,6 +107,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) dst_addr = kmap_atomic(page); memcpy(dst_addr, src_addr, MAX_INLINE_DATA); kunmap_atomic(dst_addr); + flush_dcache_page(page); SetPageUptodate(page); no_update: /* write data page to try to make data consistent */ @@ -358,6 +359,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, struct page *ipage, NR_INLINE_DENTRY * F2FS_SLOT_LEN); kunmap_atomic(dentry_blk); + flush_dcache_page(page); SetPageUptodate(page); set_page_dirty(page); -- 2.1.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2
On Mon, Nov 10, 2014 at 07:07:59AM -0800, Jaegeuk Kim wrote: > Hi Changman, > > On Mon, Nov 10, 2014 at 06:54:37PM +0900, Changman Lee wrote: > > On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote: > > > The roll-forward mechanism should be activated when the number of active > > > logs is not 2. > > > > > > Signed-off-by: Jaegeuk Kim > > > --- > > > fs/f2fs/file.c| 2 ++ > > > fs/f2fs/segment.c | 4 ++-- > > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > > index 46311e7..54722a0 100644 > > > --- a/fs/f2fs/file.c > > > +++ b/fs/f2fs/file.c > > > @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode > > > *inode) > > > need_cp = true; > > > else if (test_opt(sbi, FASTBOOT)) > > > need_cp = true; > > > + else if (sbi->active_logs == 2) > > > + need_cp = true; > > > > > > return need_cp; > > > } > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > > > index 2fb3d7f..16721b5d 100644 > > > --- a/fs/f2fs/segment.c > > > +++ b/fs/f2fs/segment.c > > > @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, > > > enum page_type p_type) > > > else > > > return CURSEG_COLD_DATA; > > > } else { > > > - if (IS_DNODE(page) && !is_cold_node(page)) > > > - return CURSEG_HOT_NODE; > > > + if (IS_DNODE(page) && is_cold_node(page)) > > > + return CURSEG_WARM_NODE; > > > > Hi Jaegeuk, > > > > We should take hot/cold seperation into account as well. > > In case of dir inode, it will be mixed with COLD_NODE. > > If it's trade-off, let's notice it kindly as comments. > > NAK. > This patch tries to fix a bug, which is not a trade-off. > We should write files' direct node blocks in CURSEG_WARM_NODE for recovery. > > Thanks, Okay, a word of 'trade-off' is wrong. We must be able to do recovery. However, we break a rule of hot/cold separation we want. So I thought we should notice its negative effect. Anyway, how about putting WARM and HOT together instead HOT and COLD? We can distinguish enough if they are direct node and have fsync_mark at recovery time although HOT/WARM are mixed. Let me know if there is my misundertanding. Thanks, > > > > > Regards, > > Changman > > > > > else > > > return CURSEG_COLD_NODE; > > > } > > > -- > > > 2.1.1 > > > > > > > > > -- > > > ___ > > > Linux-f2fs-devel mailing list > > > linux-f2fs-de...@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2
On Mon, Nov 10, 2014 at 07:07:59AM -0800, Jaegeuk Kim wrote: Hi Changman, On Mon, Nov 10, 2014 at 06:54:37PM +0900, Changman Lee wrote: On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote: The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/file.c| 2 ++ fs/f2fs/segment.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 46311e7..54722a0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode) need_cp = true; else if (test_opt(sbi, FASTBOOT)) need_cp = true; + else if (sbi-active_logs == 2) + need_cp = true; return need_cp; } diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2fb3d7f..16721b5d 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum page_type p_type) else return CURSEG_COLD_DATA; } else { - if (IS_DNODE(page) !is_cold_node(page)) - return CURSEG_HOT_NODE; + if (IS_DNODE(page) is_cold_node(page)) + return CURSEG_WARM_NODE; Hi Jaegeuk, We should take hot/cold seperation into account as well. In case of dir inode, it will be mixed with COLD_NODE. If it's trade-off, let's notice it kindly as comments. NAK. This patch tries to fix a bug, which is not a trade-off. We should write files' direct node blocks in CURSEG_WARM_NODE for recovery. Thanks, Okay, a word of 'trade-off' is wrong. We must be able to do recovery. However, we break a rule of hot/cold separation we want. So I thought we should notice its negative effect. Anyway, how about putting WARM and HOT together instead HOT and COLD? We can distinguish enough if they are direct node and have fsync_mark at recovery time although HOT/WARM are mixed. Let me know if there is my misundertanding. Thanks, Regards, Changman else return CURSEG_COLD_NODE; } -- 2.1.1 -- ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2
On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote: > The roll-forward mechanism should be activated when the number of active > logs is not 2. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/file.c| 2 ++ > fs/f2fs/segment.c | 4 ++-- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 46311e7..54722a0 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode) > need_cp = true; > else if (test_opt(sbi, FASTBOOT)) > need_cp = true; > + else if (sbi->active_logs == 2) > + need_cp = true; > > return need_cp; > } > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 2fb3d7f..16721b5d 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum > page_type p_type) > else > return CURSEG_COLD_DATA; > } else { > - if (IS_DNODE(page) && !is_cold_node(page)) > - return CURSEG_HOT_NODE; > + if (IS_DNODE(page) && is_cold_node(page)) > + return CURSEG_WARM_NODE; Hi Jaegeuk, We should take hot/cold seperation into account as well. In case of dir inode, it will be mixed with COLD_NODE. If it's trade-off, let's notice it kindly as comments. Regards, Changman > else > return CURSEG_COLD_NODE; > } > -- > 2.1.1 > > > -- > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2
On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote: The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/file.c| 2 ++ fs/f2fs/segment.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 46311e7..54722a0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode) need_cp = true; else if (test_opt(sbi, FASTBOOT)) need_cp = true; + else if (sbi-active_logs == 2) + need_cp = true; return need_cp; } diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2fb3d7f..16721b5d 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum page_type p_type) else return CURSEG_COLD_DATA; } else { - if (IS_DNODE(page) !is_cold_node(page)) - return CURSEG_HOT_NODE; + if (IS_DNODE(page) is_cold_node(page)) + return CURSEG_WARM_NODE; Hi Jaegeuk, We should take hot/cold seperation into account as well. In case of dir inode, it will be mixed with COLD_NODE. If it's trade-off, let's notice it kindly as comments. Regards, Changman else return CURSEG_COLD_NODE; } -- 2.1.1 -- ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH] f2fs: implement -o dirsync
On Sun, Nov 09, 2014 at 10:24:22PM -0800, Jaegeuk Kim wrote: > If a mount option has dirsync, we should call checkpoint for all the directory > operations. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/namei.c | 24 > 1 file changed, 24 insertions(+) > > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c > index 6312dd2..db3ee09 100644 > --- a/fs/f2fs/namei.c > +++ b/fs/f2fs/namei.c > @@ -138,6 +138,9 @@ static int f2fs_create(struct inode *dir, struct dentry > *dentry, umode_t mode, > stat_inc_inline_inode(inode); > d_instantiate(dentry, inode); > unlock_new_inode(inode); > + > + if (IS_DIRSYNC(dir)) > + f2fs_sync_fs(sbi->sb, 1); > return 0; > out: > handle_failed_inode(inode); > @@ -164,6 +167,9 @@ static int f2fs_link(struct dentry *old_dentry, struct > inode *dir, > f2fs_unlock_op(sbi); > > d_instantiate(dentry, inode); > + > + if (IS_DIRSYNC(dir)) > + f2fs_sync_fs(sbi->sb, 1); > return 0; > out: > clear_inode_flag(F2FS_I(inode), FI_INC_LINK); > @@ -233,6 +239,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry > *dentry) > f2fs_delete_entry(de, page, dir, inode); > f2fs_unlock_op(sbi); > > + if (IS_DIRSYNC(dir)) > + f2fs_sync_fs(sbi->sb, 1); > + > /* In order to evict this inode, we set it dirty */ > mark_inode_dirty(inode); Let's move it below mark_inode_dirty. After sync, it's unnecessary inserting inode into dirty_list. > fail: > @@ -268,6 +277,9 @@ static int f2fs_symlink(struct inode *dir, struct dentry > *dentry, > > d_instantiate(dentry, inode); > unlock_new_inode(inode); > + > + if (IS_DIRSYNC(dir)) > + f2fs_sync_fs(sbi->sb, 1); > return err; > out: > handle_failed_inode(inode); > @@ -304,6 +316,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry > *dentry, umode_t mode) > d_instantiate(dentry, inode); > unlock_new_inode(inode); > > + if (IS_DIRSYNC(dir)) > + f2fs_sync_fs(sbi->sb, 1); > return 0; > > out_fail: > @@ -346,8 +360,12 @@ static int f2fs_mknod(struct inode *dir, struct dentry > *dentry, > f2fs_unlock_op(sbi); > > alloc_nid_done(sbi, inode->i_ino); > + > d_instantiate(dentry, inode); > unlock_new_inode(inode); > + > + if (IS_DIRSYNC(dir)) > + f2fs_sync_fs(sbi->sb, 1); > return 0; > out: > handle_failed_inode(inode); > @@ -461,6 +479,9 @@ static int f2fs_rename(struct inode *old_dir, struct > dentry *old_dentry, > } > > f2fs_unlock_op(sbi); > + > + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) > + f2fs_sync_fs(sbi->sb, 1); > return 0; > > put_out_dir: > @@ -600,6 +621,9 @@ static int f2fs_cross_rename(struct inode *old_dir, > struct dentry *old_dentry, > update_inode_page(new_dir); > > f2fs_unlock_op(sbi); > + > + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) > + f2fs_sync_fs(sbi->sb, 1); > return 0; > out_undo: > /* Still we may fail to recover name info of f2fs_inode here */ > -- > 2.1.1 > > > -- > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 4/5] f2fs: write node pages if checkpoint is not doing
On Sat, Nov 08, 2014 at 11:36:08PM -0800, Jaegeuk Kim wrote: > It needs to write node pages if checkpoint is not doing in order to avoid > memory pressure. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/node.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 4ea2c47..6f514fb 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1314,10 +1314,12 @@ static int f2fs_write_node_page(struct page *page, > return 0; > } > > - if (wbc->for_reclaim) > - goto redirty_out; > - > - down_read(>node_write); > + if (wbc->for_reclaim) { > + if (!down_read_trylock(>node_write)) > + goto redirty_out; Previously, we skipped write_page for reclaim path, but from now on, we will write out node page to reclaim memory at any time except checkpoint. We should remember it may occur to break merging bio. Got it. Reviewed-by: Changman Lee > + } else { > + down_read(>node_write); > + } > set_page_writeback(page); > write_node_page(sbi, page, , nid, ni.blk_addr, _addr); > set_node_addr(sbi, , new_addr, is_fsync_dnode(page)); > -- > 2.1.1 > > > -- > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 3/5] f2fs: control the memory footprint used by ino entries
On Sat, Nov 08, 2014 at 11:36:07PM -0800, Jaegeuk Kim wrote: > This patch adds to control the memory footprint used by ino entries. > This will conduct best effort, not strictly. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/node.c| 28 ++-- > fs/f2fs/node.h| 3 ++- > fs/f2fs/segment.c | 3 ++- > 3 files changed, 26 insertions(+), 8 deletions(-) > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 44b8afe..4ea2c47 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -31,22 +31,38 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int > type) > { > struct f2fs_nm_info *nm_i = NM_I(sbi); > struct sysinfo val; > + unsigned long avail_ram; > unsigned long mem_size = 0; > bool res = false; > > si_meminfo(); > - /* give 25%, 25%, 50% memory for each components respectively */ > + > + /* only uses low memory */ > + avail_ram = val.totalram - val.totalhigh; > + > + /* give 25%, 25%, 50%, 50% memory for each components respectively */ Hi Jaegeuk, The memory usage of nm_i should be 100% but it's 125%. Mistake or intended? > if (type == FREE_NIDS) { > - mem_size = (nm_i->fcnt * sizeof(struct free_nid)) >> 12; > - res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 2); > + mem_size = (nm_i->fcnt * sizeof(struct free_nid)) >> > + PAGE_CACHE_SHIFT; > + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2); > } else if (type == NAT_ENTRIES) { > - mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >> 12; > - res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 2); > + mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >> > + PAGE_CACHE_SHIFT; > + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2); > } else if (type == DIRTY_DENTS) { > if (sbi->sb->s_bdi->dirty_exceeded) > return false; > mem_size = get_pages(sbi, F2FS_DIRTY_DENTS); > - res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 1); > + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1); > + } else if (type == INO_ENTRIES) { > + int i; > + > + if (sbi->sb->s_bdi->dirty_exceeded) > + return false; > + for (i = 0; i <= UPDATE_INO; i++) > + mem_size += (sbi->ino_num[i] * sizeof(struct ino_entry)) > + >> PAGE_CACHE_SHIFT; > + res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1); > } > return res; > } > diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h > index acb71e5..d10b644 100644 > --- a/fs/f2fs/node.h > +++ b/fs/f2fs/node.h > @@ -106,7 +106,8 @@ static inline void raw_nat_from_node_info(struct > f2fs_nat_entry *raw_ne, > enum mem_type { > FREE_NIDS, /* indicates the free nid list */ > NAT_ENTRIES,/* indicates the cached nat entry */ > - DIRTY_DENTS /* indicates dirty dentry pages */ > + DIRTY_DENTS,/* indicates dirty dentry pages */ > + INO_ENTRIES,/* indicates inode entries */ > }; > > struct nat_entry_set { > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 16721b5d..e094675 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -276,7 +276,8 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) > { > /* check the # of cached NAT entries and prefree segments */ > if (try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK) || > - excess_prefree_segs(sbi)) > + excess_prefree_segs(sbi) || > + available_free_memory(sbi, INO_ENTRIES)) > f2fs_sync_fs(sbi->sb, true); > } > > -- > 2.1.1 > > > -- > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 3/5] f2fs: control the memory footprint used by ino entries
On Sat, Nov 08, 2014 at 11:36:07PM -0800, Jaegeuk Kim wrote: This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/node.c| 28 ++-- fs/f2fs/node.h| 3 ++- fs/f2fs/segment.c | 3 ++- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 44b8afe..4ea2c47 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -31,22 +31,38 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct sysinfo val; + unsigned long avail_ram; unsigned long mem_size = 0; bool res = false; si_meminfo(val); - /* give 25%, 25%, 50% memory for each components respectively */ + + /* only uses low memory */ + avail_ram = val.totalram - val.totalhigh; + + /* give 25%, 25%, 50%, 50% memory for each components respectively */ Hi Jaegeuk, The memory usage of nm_i should be 100% but it's 125%. Mistake or intended? if (type == FREE_NIDS) { - mem_size = (nm_i-fcnt * sizeof(struct free_nid)) 12; - res = mem_size ((val.totalram * nm_i-ram_thresh / 100) 2); + mem_size = (nm_i-fcnt * sizeof(struct free_nid)) + PAGE_CACHE_SHIFT; + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 2); } else if (type == NAT_ENTRIES) { - mem_size = (nm_i-nat_cnt * sizeof(struct nat_entry)) 12; - res = mem_size ((val.totalram * nm_i-ram_thresh / 100) 2); + mem_size = (nm_i-nat_cnt * sizeof(struct nat_entry)) + PAGE_CACHE_SHIFT; + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 2); } else if (type == DIRTY_DENTS) { if (sbi-sb-s_bdi-dirty_exceeded) return false; mem_size = get_pages(sbi, F2FS_DIRTY_DENTS); - res = mem_size ((val.totalram * nm_i-ram_thresh / 100) 1); + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 1); + } else if (type == INO_ENTRIES) { + int i; + + if (sbi-sb-s_bdi-dirty_exceeded) + return false; + for (i = 0; i = UPDATE_INO; i++) + mem_size += (sbi-ino_num[i] * sizeof(struct ino_entry)) + PAGE_CACHE_SHIFT; + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 1); } return res; } diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h index acb71e5..d10b644 100644 --- a/fs/f2fs/node.h +++ b/fs/f2fs/node.h @@ -106,7 +106,8 @@ static inline void raw_nat_from_node_info(struct f2fs_nat_entry *raw_ne, enum mem_type { FREE_NIDS, /* indicates the free nid list */ NAT_ENTRIES,/* indicates the cached nat entry */ - DIRTY_DENTS /* indicates dirty dentry pages */ + DIRTY_DENTS,/* indicates dirty dentry pages */ + INO_ENTRIES,/* indicates inode entries */ }; struct nat_entry_set { diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 16721b5d..e094675 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -276,7 +276,8 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) { /* check the # of cached NAT entries and prefree segments */ if (try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK) || - excess_prefree_segs(sbi)) + excess_prefree_segs(sbi) || + available_free_memory(sbi, INO_ENTRIES)) f2fs_sync_fs(sbi-sb, true); } -- 2.1.1 -- ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 4/5] f2fs: write node pages if checkpoint is not doing
On Sat, Nov 08, 2014 at 11:36:08PM -0800, Jaegeuk Kim wrote: It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/node.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 4ea2c47..6f514fb 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1314,10 +1314,12 @@ static int f2fs_write_node_page(struct page *page, return 0; } - if (wbc-for_reclaim) - goto redirty_out; - - down_read(sbi-node_write); + if (wbc-for_reclaim) { + if (!down_read_trylock(sbi-node_write)) + goto redirty_out; Previously, we skipped write_page for reclaim path, but from now on, we will write out node page to reclaim memory at any time except checkpoint. We should remember it may occur to break merging bio. Got it. Reviewed-by: Changman Lee cm224@samsung.com + } else { + down_read(sbi-node_write); + } set_page_writeback(page); write_node_page(sbi, page, fio, nid, ni.blk_addr, new_addr); set_node_addr(sbi, ni, new_addr, is_fsync_dnode(page)); -- 2.1.1 -- ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH] f2fs: implement -o dirsync
On Sun, Nov 09, 2014 at 10:24:22PM -0800, Jaegeuk Kim wrote: If a mount option has dirsync, we should call checkpoint for all the directory operations. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/namei.c | 24 1 file changed, 24 insertions(+) diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 6312dd2..db3ee09 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -138,6 +138,9 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, stat_inc_inline_inode(inode); d_instantiate(dentry, inode); unlock_new_inode(inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out: handle_failed_inode(inode); @@ -164,6 +167,9 @@ static int f2fs_link(struct dentry *old_dentry, struct inode *dir, f2fs_unlock_op(sbi); d_instantiate(dentry, inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out: clear_inode_flag(F2FS_I(inode), FI_INC_LINK); @@ -233,6 +239,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry) f2fs_delete_entry(de, page, dir, inode); f2fs_unlock_op(sbi); + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); + /* In order to evict this inode, we set it dirty */ mark_inode_dirty(inode); Let's move it below mark_inode_dirty. After sync, it's unnecessary inserting inode into dirty_list. fail: @@ -268,6 +277,9 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, d_instantiate(dentry, inode); unlock_new_inode(inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return err; out: handle_failed_inode(inode); @@ -304,6 +316,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) d_instantiate(dentry, inode); unlock_new_inode(inode); + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out_fail: @@ -346,8 +360,12 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, f2fs_unlock_op(sbi); alloc_nid_done(sbi, inode-i_ino); + d_instantiate(dentry, inode); unlock_new_inode(inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out: handle_failed_inode(inode); @@ -461,6 +479,9 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, } f2fs_unlock_op(sbi); + + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; put_out_dir: @@ -600,6 +621,9 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, update_inode_page(new_dir); f2fs_unlock_op(sbi); + + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out_undo: /* Still we may fail to recover name info of f2fs_inode here */ -- 2.1.1 -- ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users
Hi JK, I think it' nicer if this can be used as 'OR' with other policy together. If so, we can also cover the weakness in high utilization. Regard, Changman On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote: > If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file > only starts to try in-place-updates. > And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it > keeps out-of-order manner. Otherwise, it triggers in-place-updates. > > This may be used by storage showing very high random write performance. > > For example, it can be used when, > > Seq. writes (Data) + wait + Seq. writes (Node) > > is pretty much slower than, > > Rand. writes (Data) > > Signed-off-by: Jaegeuk Kim > --- > Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++ > Documentation/filesystems/f2fs.txt | 9 - > fs/f2fs/f2fs.h | 1 + > fs/f2fs/file.c | 7 +++ > fs/f2fs/segment.c | 3 ++- > fs/f2fs/segment.h | 14 ++ > fs/f2fs/super.c | 2 ++ > 7 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs > b/Documentation/ABI/testing/sysfs-fs-f2fs > index 62dd725..6f9157f 100644 > --- a/Documentation/ABI/testing/sysfs-fs-f2fs > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs > @@ -44,6 +44,13 @@ Description: >Controls the FS utilization condition for the in-place-update >policies. > > +What:/sys/fs/f2fs//min_fsync_blocks > +Date:September 2014 > +Contact: "Jaegeuk Kim" > +Description: > + Controls the dirty page count condition for the in-place-update > + policies. > + > What:/sys/fs/f2fs//max_small_discards > Date:November 2013 > Contact: "Jaegeuk Kim" > diff --git a/Documentation/filesystems/f2fs.txt > b/Documentation/filesystems/f2fs.txt > index a2046a7..d010da8 100644 > --- a/Documentation/filesystems/f2fs.txt > +++ b/Documentation/filesystems/f2fs.txt > @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/ >updates in f2fs. There are five policies: > 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR, > 2: F2FS_IPU_UTIL, 3: F2FS_IPU_SSR_UTIL, > - 4: F2FS_IPU_DISABLE. > + 4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE. > > min_ipu_util This parameter controls the threshold to > trigger >in-place-updates. The number indicates > percentage >of the filesystem utilization, and used by >F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies. > > + min_fsync_blocks This parameter controls the threshold to > trigger > + in-place-updates when F2FS_IPU_FSYNC mode is > set. > + The number indicates the number of dirty pages > + when fsync needs to flush on its call path. If > + the number is less than this value, it triggers > + in-place-updates. > + > max_victim_search This parameter controls the number of trials to > find a victim segment when conducting SSR and > cleaning operations. The default value is 4096 > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 2756c16..4f84d2a 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -386,6 +386,7 @@ struct f2fs_sm_info { > > unsigned int ipu_policy;/* in-place-update policy */ > unsigned int min_ipu_util; /* in-place-update threshold */ > + unsigned int min_fsync_blocks; /* threshold for fsync */ > > /* for flush command control */ > struct flush_cmd_control *cmd_control_info; > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 77426c7..af06e22 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > trace_f2fs_sync_file_enter(inode); > > /* if fdatasync is triggered, let's do in-place-update */ > - if (datasync) > + if (get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks) > set_inode_flag(fi, FI_NEED_IPU); > - > ret = filemap_write_and_wait_range(inode->i_mapping, start, end); > - if (datasync) > - clear_inode_flag(fi, FI_NEED_IPU); > + clear_inode_flag(fi, FI_NEED_IPU); > + > if (ret) { > trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); > return ret; > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index e158d63..c6f627b 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -1928,8
Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users
Hi JK, I think it' nicer if this can be used as 'OR' with other policy together. If so, we can also cover the weakness in high utilization. Regard, Changman On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote: If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++ Documentation/filesystems/f2fs.txt | 9 - fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 7 +++ fs/f2fs/segment.c | 3 ++- fs/f2fs/segment.h | 14 ++ fs/f2fs/super.c | 2 ++ 7 files changed, 33 insertions(+), 10 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 62dd725..6f9157f 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -44,6 +44,13 @@ Description: Controls the FS utilization condition for the in-place-update policies. +What:/sys/fs/f2fs/disk/min_fsync_blocks +Date:September 2014 +Contact: Jaegeuk Kim jaeg...@kernel.org +Description: + Controls the dirty page count condition for the in-place-update + policies. + What:/sys/fs/f2fs/disk/max_small_discards Date:November 2013 Contact: Jaegeuk Kim jaegeuk@samsung.com diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index a2046a7..d010da8 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/devname updates in f2fs. There are five policies: 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR, 2: F2FS_IPU_UTIL, 3: F2FS_IPU_SSR_UTIL, - 4: F2FS_IPU_DISABLE. + 4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE. min_ipu_util This parameter controls the threshold to trigger in-place-updates. The number indicates percentage of the filesystem utilization, and used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies. + min_fsync_blocks This parameter controls the threshold to trigger + in-place-updates when F2FS_IPU_FSYNC mode is set. + The number indicates the number of dirty pages + when fsync needs to flush on its call path. If + the number is less than this value, it triggers + in-place-updates. + max_victim_search This parameter controls the number of trials to find a victim segment when conducting SSR and cleaning operations. The default value is 4096 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 2756c16..4f84d2a 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -386,6 +386,7 @@ struct f2fs_sm_info { unsigned int ipu_policy;/* in-place-update policy */ unsigned int min_ipu_util; /* in-place-update threshold */ + unsigned int min_fsync_blocks; /* threshold for fsync */ /* for flush command control */ struct flush_cmd_control *cmd_control_info; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 77426c7..af06e22 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) trace_f2fs_sync_file_enter(inode); /* if fdatasync is triggered, let's do in-place-update */ - if (datasync) + if (get_dirty_pages(inode) = SM_I(sbi)-min_fsync_blocks) set_inode_flag(fi, FI_NEED_IPU); - ret = filemap_write_and_wait_range(inode-i_mapping, start, end); - if (datasync) - clear_inode_flag(fi, FI_NEED_IPU); + clear_inode_flag(fi, FI_NEED_IPU); + if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e158d63..c6f627b 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1928,8 +1928,9 @@ int build_segment_manager(struct
Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode
Hi, On Thu, Aug 28, 2014 at 04:53:01PM +0800, Chao Yu wrote: > Hi Changman, > > > -Original Message- > > From: Changman Lee [mailto:cm224@samsung.com] > > Sent: Thursday, August 28, 2014 9:48 AM > > To: Chao Yu > > Cc: Jaegeuk Kim; linux-kernel@vger.kernel.org; > > linux-f2fs-de...@lists.sourceforge.net > > Subject: Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to > > prevent accessing invalid > > inode > > > > Hi Chao, > > > > I agree it's correct unlock_new_inode should be located after > > make_bad_inode. > > > > About this scenario, > > I think we should check some condition if this could be occured; > > I think this condition is the almost impossible but which can happen > theoretically. > > > A inode allocated newly could be victim by gc thread. > > Then, f2fs_iget called by Thread A have to fail because we handled it as > > bad_inode in Thread B. However, f2fs_iget could still get inode. > > How about check it using is_bad_inode() in f2fs_iget. > > Yes, agreed. How about return -EIO when this inode we iget_locked is bad? Hmm.. It might be better to check return value of f2fs_iget like other f/s. - a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -595,6 +595,8 @@ next_step: inode = f2fs_iget(sb, dni.ino); if (IS_ERR(inode)) continue; + else if (is_bad_inode(inode)) + continue; Thanks, Changman > > Thanks, > Yu > > > > > Thanks, > > > > On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote: > > > As the race condition on the inode cache, following scenario can appear: > > > [Thread a][Thread b] > > > ->f2fs_mkdir > > > ->f2fs_add_link > > > ->__f2fs_add_link > > > ->init_inode_metadata failed here > > > ->gc_thread_func > > > ->f2fs_gc > > > ->do_garbage_collect > > > ->gc_data_segment > > > ->f2fs_iget > > > ->iget_locked > > > ->wait_on_inode > > > ->unlock_new_inode > > > ->move_data_page > > > ->make_bad_inode > > > ->iput > > > > > > When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated > > > inode > > > should be set as bad to avoid being accessed by other thread. But in above > > > scenario, it allows f2fs to access the invalid inode before this inode > > > was set > > > as bad. > > > This patch fix the potential problem, and this issue was found by code > > > review. > > > > > > Signed-off-by: Chao Yu > > > --- > > > fs/f2fs/namei.c | 10 +- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c > > > index 6b53ce9..845f1be 100644 > > > --- a/fs/f2fs/namei.c > > > +++ b/fs/f2fs/namei.c > > > @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct > > > dentry *dentry, umode_t > > mode, > > > return 0; > > > out: > > > clear_nlink(inode); > > > - unlock_new_inode(inode); > > > make_bad_inode(inode); > > > + unlock_new_inode(inode); > > > iput(inode); > > > alloc_nid_failed(sbi, ino); > > > return err; > > > @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct > > > dentry *dentry, > > > return err; > > > out: > > > clear_nlink(inode); > > > - unlock_new_inode(inode); > > > make_bad_inode(inode); > > > + unlock_new_inode(inode); > > > iput(inode); > > > alloc_nid_failed(sbi, inode->i_ino); > > > return err; > > > @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct > > > dentry *dentry, umode_t > > mode) > > > out_fail: > > > clear_inode_flag(F2FS_I(inode), FI_INC_LINK); > > > clear_nlink(inode); > > > - unlock_new_inode(inode); > > > make_bad_inode(inode); > > > + unlock_new_inode(inode); > > > iput(inode); > > > alloc_nid_failed(sbi, inode->i_ino); > > >
Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode
Hi, On Thu, Aug 28, 2014 at 04:53:01PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Thursday, August 28, 2014 9:48 AM To: Chao Yu Cc: Jaegeuk Kim; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode Hi Chao, I agree it's correct unlock_new_inode should be located after make_bad_inode. About this scenario, I think we should check some condition if this could be occured; I think this condition is the almost impossible but which can happen theoretically. A inode allocated newly could be victim by gc thread. Then, f2fs_iget called by Thread A have to fail because we handled it as bad_inode in Thread B. However, f2fs_iget could still get inode. How about check it using is_bad_inode() in f2fs_iget. Yes, agreed. How about return -EIO when this inode we iget_locked is bad? Hmm.. It might be better to check return value of f2fs_iget like other f/s. - a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -595,6 +595,8 @@ next_step: inode = f2fs_iget(sb, dni.ino); if (IS_ERR(inode)) continue; + else if (is_bad_inode(inode)) + continue; Thanks, Changman Thanks, Yu Thanks, On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote: As the race condition on the inode cache, following scenario can appear: [Thread a][Thread b] -f2fs_mkdir -f2fs_add_link -__f2fs_add_link -init_inode_metadata failed here -gc_thread_func -f2fs_gc -do_garbage_collect -gc_data_segment -f2fs_iget -iget_locked -wait_on_inode -unlock_new_inode -move_data_page -make_bad_inode -iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/namei.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 6b53ce9..845f1be 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, return 0; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, ino); return err; @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, return err; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) out_fail: clear_inode_flag(F2FS_I(inode), FI_INC_LINK); clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, return 0; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -688,8 +688,8 @@ release_out: out: f2fs_unlock_op(sbi); clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; -- 2.0.0.421.g786a89d -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo
Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode
Hi Chao, I agree it's correct unlock_new_inode should be located after make_bad_inode. About this scenario, I think we should check some condition if this could be occured; A inode allocated newly could be victim by gc thread. Then, f2fs_iget called by Thread A have to fail because we handled it as bad_inode in Thread B. However, f2fs_iget could still get inode. How about check it using is_bad_inode() in f2fs_iget. Thanks, On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote: > As the race condition on the inode cache, following scenario can appear: > [Thread a][Thread b] > ->f2fs_mkdir > ->f2fs_add_link > ->__f2fs_add_link > ->init_inode_metadata failed here > ->gc_thread_func > ->f2fs_gc > ->do_garbage_collect > ->gc_data_segment > ->f2fs_iget > ->iget_locked > ->wait_on_inode > ->unlock_new_inode > ->move_data_page > ->make_bad_inode > ->iput > > When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode > should be set as bad to avoid being accessed by other thread. But in above > scenario, it allows f2fs to access the invalid inode before this inode was set > as bad. > This patch fix the potential problem, and this issue was found by code review. > > Signed-off-by: Chao Yu > --- > fs/f2fs/namei.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c > index 6b53ce9..845f1be 100644 > --- a/fs/f2fs/namei.c > +++ b/fs/f2fs/namei.c > @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct dentry > *dentry, umode_t mode, > return 0; > out: > clear_nlink(inode); > - unlock_new_inode(inode); > make_bad_inode(inode); > + unlock_new_inode(inode); > iput(inode); > alloc_nid_failed(sbi, ino); > return err; > @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry > *dentry, > return err; > out: > clear_nlink(inode); > - unlock_new_inode(inode); > make_bad_inode(inode); > + unlock_new_inode(inode); > iput(inode); > alloc_nid_failed(sbi, inode->i_ino); > return err; > @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry > *dentry, umode_t mode) > out_fail: > clear_inode_flag(F2FS_I(inode), FI_INC_LINK); > clear_nlink(inode); > - unlock_new_inode(inode); > make_bad_inode(inode); > + unlock_new_inode(inode); > iput(inode); > alloc_nid_failed(sbi, inode->i_ino); > return err; > @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry > *dentry, > return 0; > out: > clear_nlink(inode); > - unlock_new_inode(inode); > make_bad_inode(inode); > + unlock_new_inode(inode); > iput(inode); > alloc_nid_failed(sbi, inode->i_ino); > return err; > @@ -688,8 +688,8 @@ release_out: > out: > f2fs_unlock_op(sbi); > clear_nlink(inode); > - unlock_new_inode(inode); > make_bad_inode(inode); > + unlock_new_inode(inode); > iput(inode); > alloc_nid_failed(sbi, inode->i_ino); > return err; > -- > 2.0.0.421.g786a89d > > > > -- > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode
Hi Chao, I agree it's correct unlock_new_inode should be located after make_bad_inode. About this scenario, I think we should check some condition if this could be occured; A inode allocated newly could be victim by gc thread. Then, f2fs_iget called by Thread A have to fail because we handled it as bad_inode in Thread B. However, f2fs_iget could still get inode. How about check it using is_bad_inode() in f2fs_iget. Thanks, On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote: As the race condition on the inode cache, following scenario can appear: [Thread a][Thread b] -f2fs_mkdir -f2fs_add_link -__f2fs_add_link -init_inode_metadata failed here -gc_thread_func -f2fs_gc -do_garbage_collect -gc_data_segment -f2fs_iget -iget_locked -wait_on_inode -unlock_new_inode -move_data_page -make_bad_inode -iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/namei.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 6b53ce9..845f1be 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, return 0; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, ino); return err; @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, return err; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) out_fail: clear_inode_flag(F2FS_I(inode), FI_INC_LINK); clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, return 0; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -688,8 +688,8 @@ release_out: out: f2fs_unlock_op(sbi); clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; -- 2.0.0.421.g786a89d -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes
Hi Chao, On Wed, Jul 30, 2014 at 09:07:49PM +0800, Chao Yu wrote: > Hi Jaegeuk Changman, > > > -Original Message- > > From: Chao Yu [mailto:chao2...@samsung.com] > > Sent: Thursday, July 03, 2014 6:59 PM > > To: Jaegeuk Kim; Changman Lee > > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > > linux-f2fs-de...@lists.sourceforge.net > > Subject: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes > > > > We do not need to block on ->node_write among different node page writers > > e.g. > > fsync/flush, unless we have a node page writer from write_checkpoint. > > So it's better use rw_semaphore instead of mutex type for ->node_write to > > promote performance. > > If you could have time to help explaining the problem of this patch, I will be > appreciated for that. I have no clue. Except checkpoint, I don't know why need to block to write node page. Do you have any problem when you test with this patch? > > Another question is what is ->writepages in sbi used for? I'm not quite clear. > I remember it is for writing data pages per thread as much as possible. When multi-threads write some files simultaneously, multi-threads contended with each other to allocate a block. So block allocation was interleaved across threads. It makes fragmentation of file. Thanks, > Thanks, > > > > > Signed-off-by: Chao Yu > > --- > > fs/f2fs/checkpoint.c |6 +++--- > > fs/f2fs/f2fs.h |2 +- > > fs/f2fs/node.c |4 ++-- > > fs/f2fs/super.c |2 +- > > 4 files changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index 0b4710c..eec406b 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -714,10 +714,10 @@ retry_flush_dents: > > * until finishing nat/sit flush. > > */ > > retry_flush_nodes: > > - mutex_lock(>node_write); > > + down_write(>node_write); > > > > if (get_pages(sbi, F2FS_DIRTY_NODES)) { > > - mutex_unlock(>node_write); > > + up_write(>node_write); > > sync_node_pages(sbi, 0, ); > > goto retry_flush_nodes; > > } > > @@ -726,7 +726,7 @@ retry_flush_nodes: > > > > static void unblock_operations(struct f2fs_sb_info *sbi) > > { > > - mutex_unlock(>node_write); > > + up_write(>node_write); > > f2fs_unlock_all(sbi); > > } > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index ae3b4ac..ca30b5a 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -444,7 +444,7 @@ struct f2fs_sb_info { > > struct inode *meta_inode; /* cache meta blocks */ > > struct mutex cp_mutex; /* checkpoint procedure lock */ > > struct rw_semaphore cp_rwsem; /* blocking FS operations */ > > - struct mutex node_write;/* locking node writes */ > > + struct rw_semaphore node_write; /* locking node writes */ > > struct mutex writepages;/* mutex for writepages() */ > > bool por_doing; /* recovery is doing or not */ > > wait_queue_head_t cp_wait; > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > index a90f51d..7b5b5de 100644 > > --- a/fs/f2fs/node.c > > +++ b/fs/f2fs/node.c > > @@ -1231,12 +1231,12 @@ static int f2fs_write_node_page(struct page *page, > > if (wbc->for_reclaim) > > goto redirty_out; > > > > - mutex_lock(>node_write); > > + down_read(>node_write); > > set_page_writeback(page); > > write_node_page(sbi, page, , nid, ni.blk_addr, _addr); > > set_node_addr(sbi, , new_addr, is_fsync_dnode(page)); > > dec_page_count(sbi, F2FS_DIRTY_NODES); > > - mutex_unlock(>node_write); > > + up_read(>node_write); > > unlock_page(page); > > return 0; > > > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > > index 8f96d93..bed9413 100644 > > --- a/fs/f2fs/super.c > > +++ b/fs/f2fs/super.c > > @@ -947,7 +947,7 @@ static int f2fs_fill_super(struct super_block *sb, void > > *data, int silent) > > mutex_init(>gc_mutex); > > mutex_init(>writepages); > > mutex_init(>cp_mutex); > > - mutex_init(>node_write); > > + init_rwsem(>node_write); > > sbi->por_doing = false; > > spin_lock_init(>stat_lock); > > > > -- > > 1.7.9.5 > > > > &g
Re: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes
Hi Chao, On Wed, Jul 30, 2014 at 09:07:49PM +0800, Chao Yu wrote: Hi Jaegeuk Changman, -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Thursday, July 03, 2014 6:59 PM To: Jaegeuk Kim; Changman Lee Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: [f2fs-dev] [PATCH] f2fs: reduce competition among node page writes We do not need to block on -node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for -node_write to promote performance. If you could have time to help explaining the problem of this patch, I will be appreciated for that. I have no clue. Except checkpoint, I don't know why need to block to write node page. Do you have any problem when you test with this patch? Another question is what is -writepages in sbi used for? I'm not quite clear. I remember it is for writing data pages per thread as much as possible. When multi-threads write some files simultaneously, multi-threads contended with each other to allocate a block. So block allocation was interleaved across threads. It makes fragmentation of file. Thanks, Thanks, Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/checkpoint.c |6 +++--- fs/f2fs/f2fs.h |2 +- fs/f2fs/node.c |4 ++-- fs/f2fs/super.c |2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 0b4710c..eec406b 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -714,10 +714,10 @@ retry_flush_dents: * until finishing nat/sit flush. */ retry_flush_nodes: - mutex_lock(sbi-node_write); + down_write(sbi-node_write); if (get_pages(sbi, F2FS_DIRTY_NODES)) { - mutex_unlock(sbi-node_write); + up_write(sbi-node_write); sync_node_pages(sbi, 0, wbc); goto retry_flush_nodes; } @@ -726,7 +726,7 @@ retry_flush_nodes: static void unblock_operations(struct f2fs_sb_info *sbi) { - mutex_unlock(sbi-node_write); + up_write(sbi-node_write); f2fs_unlock_all(sbi); } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ae3b4ac..ca30b5a 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -444,7 +444,7 @@ struct f2fs_sb_info { struct inode *meta_inode; /* cache meta blocks */ struct mutex cp_mutex; /* checkpoint procedure lock */ struct rw_semaphore cp_rwsem; /* blocking FS operations */ - struct mutex node_write;/* locking node writes */ + struct rw_semaphore node_write; /* locking node writes */ struct mutex writepages;/* mutex for writepages() */ bool por_doing; /* recovery is doing or not */ wait_queue_head_t cp_wait; diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index a90f51d..7b5b5de 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1231,12 +1231,12 @@ static int f2fs_write_node_page(struct page *page, if (wbc-for_reclaim) goto redirty_out; - mutex_lock(sbi-node_write); + down_read(sbi-node_write); set_page_writeback(page); write_node_page(sbi, page, fio, nid, ni.blk_addr, new_addr); set_node_addr(sbi, ni, new_addr, is_fsync_dnode(page)); dec_page_count(sbi, F2FS_DIRTY_NODES); - mutex_unlock(sbi-node_write); + up_read(sbi-node_write); unlock_page(page); return 0; diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 8f96d93..bed9413 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -947,7 +947,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) mutex_init(sbi-gc_mutex); mutex_init(sbi-writepages); mutex_init(sbi-cp_mutex); - mutex_init(sbi-node_write); + init_rwsem(sbi-node_write); sbi-por_doing = false; spin_lock_init(sbi-stat_lock); -- 1.7.9.5 -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
On Tue, Jul 29, 2014 at 06:08:21PM -0700, Jaegeuk Kim wrote: > On Wed, Jul 30, 2014 at 08:54:55AM +0900, Changman Lee wrote: > > On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote: > > > Hi Changman, > > > > > > On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote: > > > > Hi Jaegeuk, > > > > > > > > On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: > > > > > This patch enforces in-place-updates only when fdatasync is requested. > > > > > If we adopt this in-place-updates for the fdatasync, we can skip to > > > > > write the > > > > > recovery information. > > > > > > > > But, as you know, random write occurs when changing into > > > > in-place-updates. > > > > It will degrade write performance. Is there any case in-place-updates is > > > > better, except recovery or high utilization? > > > > > > As I described, you can easily imagine, if users requested small amount > > > of data > > > writes with fdatasync, we should do data writes + node writes. > > > But, if we can do in-place-update, we don't need to write node blocks. > > > Surely it triggers random writes, however, the amount of data is preety > > > small > > > and the device handles them very fast by its inside cache, so that it can > > > enhance the performance. > > > > > > Thanks, > > > > Partially agree. Sometimes, I see that SSR shows lower performance than > > IPU. One of the reasons might be node writes. > > What did you mean? That's why I consider IPU eagarly instead of SSR and LFS > under the very strict cases. > Okay, I understood your intention. This discussion seems to be far from this thread a litte bit. Background I told as above is that I got better number from IPU when I tested fio under fragmentation by varmail and dd; and utilization about 93%. The result of perf shows f2fs spends the most cpu time searching victim in SSR mode. And f2fs had to write node data additionaly. I think this condition could be one of the strict case as you told. Thanks, > > Anyway, if so, we should know total dirty pages for fdatasync and it's very > > tunable according to a random write performance of device. > > Agreed. We can do that either by comparing the number of dirty pages, > additional data/node writes, and cost of checkpoint at the same time. > And there is another thing is that we need to consider the number of > waiting time for end_io. > I'll look into this at some time. > > Thanks, > > > > > Thanks, > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Signed-off-by: Jaegeuk Kim > > > > > --- > > > > > fs/f2fs/f2fs.h| 1 + > > > > > fs/f2fs/file.c| 7 +++ > > > > > fs/f2fs/segment.h | 4 > > > > > 3 files changed, 12 insertions(+) > > > > > > > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > > > > index ab36025..8f8685e 100644 > > > > > --- a/fs/f2fs/f2fs.h > > > > > +++ b/fs/f2fs/f2fs.h > > > > > @@ -998,6 +998,7 @@ enum { > > > > > FI_INLINE_DATA, /* used for inline data*/ > > > > > FI_APPEND_WRITE,/* inode has appended data */ > > > > > FI_UPDATE_WRITE,/* inode has in-place-update data */ > > > > > + FI_NEED_IPU,/* used fo ipu for fdatasync */ > > > > > }; > > > > > > > > > > static inline void set_inode_flag(struct f2fs_inode_info *fi, int > > > > > flag) > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > > > > index 121689a..e339856 100644 > > > > > --- a/fs/f2fs/file.c > > > > > +++ b/fs/f2fs/file.c > > > > > @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t > > > > > start, loff_t end, int datasync) > > > > > return 0; > > > > > > > > > > trace_f2fs_sync_file_enter(inode); > > > > > + > > > > > + /* if fdatasync is triggered, let's do in-place-update */ > > > > > + if (datasync) > > > > > + set_inode_flag(fi, FI_NEED_IPU); > > > > > + > > > > > ret = filemap_write_and_wait_range(inode->i_mapping, start, > > > > > end); > > > >
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote: > Hi Changman, > > On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote: > > Hi Jaegeuk, > > > > On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: > > > This patch enforces in-place-updates only when fdatasync is requested. > > > If we adopt this in-place-updates for the fdatasync, we can skip to write > > > the > > > recovery information. > > > > But, as you know, random write occurs when changing into in-place-updates. > > It will degrade write performance. Is there any case in-place-updates is > > better, except recovery or high utilization? > > As I described, you can easily imagine, if users requested small amount of > data > writes with fdatasync, we should do data writes + node writes. > But, if we can do in-place-update, we don't need to write node blocks. > Surely it triggers random writes, however, the amount of data is preety small > and the device handles them very fast by its inside cache, so that it can > enhance the performance. > > Thanks, Partially agree. Sometimes, I see that SSR shows lower performance than IPU. One of the reasons might be node writes. Anyway, if so, we should know total dirty pages for fdatasync and it's very tunable according to a random write performance of device. Thanks, > > > > > Thanks > > > > > > > > Signed-off-by: Jaegeuk Kim > > > --- > > > fs/f2fs/f2fs.h| 1 + > > > fs/f2fs/file.c| 7 +++ > > > fs/f2fs/segment.h | 4 > > > 3 files changed, 12 insertions(+) > > > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > > index ab36025..8f8685e 100644 > > > --- a/fs/f2fs/f2fs.h > > > +++ b/fs/f2fs/f2fs.h > > > @@ -998,6 +998,7 @@ enum { > > > FI_INLINE_DATA, /* used for inline data*/ > > > FI_APPEND_WRITE,/* inode has appended data */ > > > FI_UPDATE_WRITE,/* inode has in-place-update data */ > > > + FI_NEED_IPU,/* used fo ipu for fdatasync */ > > > }; > > > > > > static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > > index 121689a..e339856 100644 > > > --- a/fs/f2fs/file.c > > > +++ b/fs/f2fs/file.c > > > @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, > > > loff_t end, int datasync) > > > return 0; > > > > > > trace_f2fs_sync_file_enter(inode); > > > + > > > + /* if fdatasync is triggered, let's do in-place-update */ > > > + if (datasync) > > > + set_inode_flag(fi, FI_NEED_IPU); > > > + > > > ret = filemap_write_and_wait_range(inode->i_mapping, start, end); > > > if (ret) { > > > trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); > > > return ret; > > > } > > > + if (datasync) > > > + clear_inode_flag(fi, FI_NEED_IPU); > > > > > > /* > > >* if there is no written data, don't waste time to write recovery info. > > > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h > > > index ee5c75e..55973f7 100644 > > > --- a/fs/f2fs/segment.h > > > +++ b/fs/f2fs/segment.h > > > @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode > > > *inode) > > > if (S_ISDIR(inode->i_mode)) > > > return false; > > > > > > + /* this is only set during fdatasync */ > > > + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) > > > + return true; > > > + > > > switch (SM_I(sbi)->ipu_policy) { > > > case F2FS_IPU_FORCE: > > > return true; > > > -- > > > 1.8.5.2 (Apple Git-48) > > > > > > > > > -- > > > Want fast and easy access to all the code in your enterprise? Index and > > > search up to 200,000 lines of code with a free copy of Black Duck > > > Code Sight - the same software that powers the world's largest code > > > search on Ohloh, the Black Duck Open Hub! Try it now. > > > http://p.sf.net/sfu/bds > > > ___ > > > Linux-f2fs-devel mailing list > > > linux-f2fs-de...@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote: Hi Changman, On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote: Hi Jaegeuk, On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. But, as you know, random write occurs when changing into in-place-updates. It will degrade write performance. Is there any case in-place-updates is better, except recovery or high utilization? As I described, you can easily imagine, if users requested small amount of data writes with fdatasync, we should do data writes + node writes. But, if we can do in-place-update, we don't need to write node blocks. Surely it triggers random writes, however, the amount of data is preety small and the device handles them very fast by its inside cache, so that it can enhance the performance. Thanks, Partially agree. Sometimes, I see that SSR shows lower performance than IPU. One of the reasons might be node writes. Anyway, if so, we should know total dirty pages for fdatasync and it's very tunable according to a random write performance of device. Thanks, Thanks Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 7 +++ fs/f2fs/segment.h | 4 3 files changed, 12 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ab36025..8f8685e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -998,6 +998,7 @@ enum { FI_INLINE_DATA, /* used for inline data*/ FI_APPEND_WRITE,/* inode has appended data */ FI_UPDATE_WRITE,/* inode has in-place-update data */ + FI_NEED_IPU,/* used fo ipu for fdatasync */ }; static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 121689a..e339856 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return 0; trace_f2fs_sync_file_enter(inode); + + /* if fdatasync is triggered, let's do in-place-update */ + if (datasync) + set_inode_flag(fi, FI_NEED_IPU); + ret = filemap_write_and_wait_range(inode-i_mapping, start, end); if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; } + if (datasync) + clear_inode_flag(fi, FI_NEED_IPU); /* * if there is no written data, don't waste time to write recovery info. diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index ee5c75e..55973f7 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode *inode) if (S_ISDIR(inode-i_mode)) return false; + /* this is only set during fdatasync */ + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) + return true; + switch (SM_I(sbi)-ipu_policy) { case F2FS_IPU_FORCE: return true; -- 1.8.5.2 (Apple Git-48) -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
On Tue, Jul 29, 2014 at 06:08:21PM -0700, Jaegeuk Kim wrote: On Wed, Jul 30, 2014 at 08:54:55AM +0900, Changman Lee wrote: On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote: Hi Changman, On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote: Hi Jaegeuk, On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. But, as you know, random write occurs when changing into in-place-updates. It will degrade write performance. Is there any case in-place-updates is better, except recovery or high utilization? As I described, you can easily imagine, if users requested small amount of data writes with fdatasync, we should do data writes + node writes. But, if we can do in-place-update, we don't need to write node blocks. Surely it triggers random writes, however, the amount of data is preety small and the device handles them very fast by its inside cache, so that it can enhance the performance. Thanks, Partially agree. Sometimes, I see that SSR shows lower performance than IPU. One of the reasons might be node writes. What did you mean? That's why I consider IPU eagarly instead of SSR and LFS under the very strict cases. Okay, I understood your intention. This discussion seems to be far from this thread a litte bit. Background I told as above is that I got better number from IPU when I tested fio under fragmentation by varmail and dd; and utilization about 93%. The result of perf shows f2fs spends the most cpu time searching victim in SSR mode. And f2fs had to write node data additionaly. I think this condition could be one of the strict case as you told. Thanks, Anyway, if so, we should know total dirty pages for fdatasync and it's very tunable according to a random write performance of device. Agreed. We can do that either by comparing the number of dirty pages, additional data/node writes, and cost of checkpoint at the same time. And there is another thing is that we need to consider the number of waiting time for end_io. I'll look into this at some time. Thanks, Thanks, Thanks Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 7 +++ fs/f2fs/segment.h | 4 3 files changed, 12 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ab36025..8f8685e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -998,6 +998,7 @@ enum { FI_INLINE_DATA, /* used for inline data*/ FI_APPEND_WRITE,/* inode has appended data */ FI_UPDATE_WRITE,/* inode has in-place-update data */ + FI_NEED_IPU,/* used fo ipu for fdatasync */ }; static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 121689a..e339856 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return 0; trace_f2fs_sync_file_enter(inode); + + /* if fdatasync is triggered, let's do in-place-update */ + if (datasync) + set_inode_flag(fi, FI_NEED_IPU); + ret = filemap_write_and_wait_range(inode-i_mapping, start, end); if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; } + if (datasync) + clear_inode_flag(fi, FI_NEED_IPU); /* * if there is no written data, don't waste time to write recovery info. diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index ee5c75e..55973f7 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode *inode) if (S_ISDIR(inode-i_mode)) return false; + /* this is only set during fdatasync */ + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) + return true; + switch (SM_I(sbi)-ipu_policy) { case F2FS_IPU_FORCE: return true; -- 1.8.5.2 (Apple Git-48) -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
Hi Jaegeuk, On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: > This patch enforces in-place-updates only when fdatasync is requested. > If we adopt this in-place-updates for the fdatasync, we can skip to write the > recovery information. But, as you know, random write occurs when changing into in-place-updates. It will degrade write performance. Is there any case in-place-updates is better, except recovery or high utilization? Thanks > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/f2fs.h| 1 + > fs/f2fs/file.c| 7 +++ > fs/f2fs/segment.h | 4 > 3 files changed, 12 insertions(+) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index ab36025..8f8685e 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -998,6 +998,7 @@ enum { > FI_INLINE_DATA, /* used for inline data*/ > FI_APPEND_WRITE,/* inode has appended data */ > FI_UPDATE_WRITE,/* inode has in-place-update data */ > + FI_NEED_IPU,/* used fo ipu for fdatasync */ > }; > > static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 121689a..e339856 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > return 0; > > trace_f2fs_sync_file_enter(inode); > + > + /* if fdatasync is triggered, let's do in-place-update */ > + if (datasync) > + set_inode_flag(fi, FI_NEED_IPU); > + > ret = filemap_write_and_wait_range(inode->i_mapping, start, end); > if (ret) { > trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); > return ret; > } > + if (datasync) > + clear_inode_flag(fi, FI_NEED_IPU); > > /* >* if there is no written data, don't waste time to write recovery info. > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h > index ee5c75e..55973f7 100644 > --- a/fs/f2fs/segment.h > +++ b/fs/f2fs/segment.h > @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode > *inode) > if (S_ISDIR(inode->i_mode)) > return false; > > + /* this is only set during fdatasync */ > + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) > + return true; > + > switch (SM_I(sbi)->ipu_policy) { > case F2FS_IPU_FORCE: > return true; > -- > 1.8.5.2 (Apple Git-48) > > > -- > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
Hi Jaegeuk, On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. But, as you know, random write occurs when changing into in-place-updates. It will degrade write performance. Is there any case in-place-updates is better, except recovery or high utilization? Thanks Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 7 +++ fs/f2fs/segment.h | 4 3 files changed, 12 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ab36025..8f8685e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -998,6 +998,7 @@ enum { FI_INLINE_DATA, /* used for inline data*/ FI_APPEND_WRITE,/* inode has appended data */ FI_UPDATE_WRITE,/* inode has in-place-update data */ + FI_NEED_IPU,/* used fo ipu for fdatasync */ }; static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 121689a..e339856 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return 0; trace_f2fs_sync_file_enter(inode); + + /* if fdatasync is triggered, let's do in-place-update */ + if (datasync) + set_inode_flag(fi, FI_NEED_IPU); + ret = filemap_write_and_wait_range(inode-i_mapping, start, end); if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; } + if (datasync) + clear_inode_flag(fi, FI_NEED_IPU); /* * if there is no written data, don't waste time to write recovery info. diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index ee5c75e..55973f7 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode *inode) if (S_ISDIR(inode-i_mode)) return false; + /* this is only set during fdatasync */ + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) + return true; + switch (SM_I(sbi)-ipu_policy) { case F2FS_IPU_FORCE: return true; -- 1.8.5.2 (Apple Git-48) -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Tue, May 27, 2014 at 02:32:57PM +0800, Chao Yu wrote: > Hi changman, > > > -Original Message- > > From: Changman Lee [mailto:cm224@samsung.com] > > Sent: Tuesday, May 27, 2014 9:25 AM > > To: Chao Yu > > Cc: Jaegeuk Kim; linux-fsde...@vger.kernel.org; > > linux-kernel@vger.kernel.org; > > linux-f2fs-de...@lists.sourceforge.net > > Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace > > f2fs_submit_page_mbio event > > in ra_sum_pages > > > > Hi, Chao > > > > Could you think about following once. > > move node_inode in front of build_segment_manager, then use node_inode > > instead of bd_inode. > > Jaegeuk and I discussed this solution previously in > [PATCH 3/3 V3] f2fs: introduce f2fs_cache_node_page() to add page into > node_inode cache > > You can see it from this url: > http://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/?viewmonth=201312=5 > > And it seems not easy to change order of build_*_manager and make node_inode, > because there are dependency between them. > Sorry to make a mess your patch thread. I've understood it. In your patch, using NAT journal seems to be possible. Anyway, thanks for your answer. > > > > On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote: > > > Previously we allocate pages with no mapping in ra_sum_pages(), so we may > > > encounter a crash in event trace of f2fs_submit_page_mbio where we access > > > mapping data of the page. > > > > > > We'd better allocate pages in bd_inode mapping and invalidate these pages > > > after > > > we restore data from pages. It could avoid crash in above scenario. > > > > > > Changes from V1 > > > o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. > > > > > > Call Trace: > > > [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] > > > [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] > > > [] restore_node_summary+0x13a/0x280 [f2fs] > > > [] build_curseg+0x2bd/0x620 [f2fs] > > > [] build_segment_manager+0x1cb/0x920 [f2fs] > > > [] f2fs_fill_super+0x535/0x8e0 [f2fs] > > > [] mount_bdev+0x16a/0x1a0 > > > [] f2fs_mount+0x1f/0x30 [f2fs] > > > [] mount_fs+0x36/0x170 > > > [] vfs_kern_mount+0x55/0xe0 > > > [] do_mount+0x1e8/0x900 > > > [] SyS_mount+0x82/0xc0 > > > [] sysenter_do_call+0x12/0x22 > > > > > > Suggested-by: Jaegeuk Kim > > > Signed-off-by: Chao Yu > > > --- > > > fs/f2fs/node.c | 52 > > > > > > 1 file changed, 24 insertions(+), 28 deletions(-) > > > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > > index 3d60d3d..02a59e9 100644 > > > --- a/fs/f2fs/node.c > > > +++ b/fs/f2fs/node.c > > > @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, > > > struct page *page) > > > > > > /* > > > * ra_sum_pages() merge contiguous pages into one bio and submit. > > > - * these pre-readed pages are linked in pages list. > > > + * these pre-readed pages are alloced in bd_inode's mapping tree. > > > */ > > > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head > > > *pages, > > > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, > > > int start, int nrpages) > > > { > > > - struct page *page; > > > - int page_idx = start; > > > + struct inode *inode = sbi->sb->s_bdev->bd_inode; > > > + struct address_space *mapping = inode->i_mapping; > > > + int i, page_idx = start; > > > struct f2fs_io_info fio = { > > > .type = META, > > > .rw = READ_SYNC | REQ_META | REQ_PRIO > > > }; > > > > > > - for (; page_idx < start + nrpages; page_idx++) { > > > - /* alloc temporal page for read node summary info*/ > > > - page = alloc_page(GFP_F2FS_ZERO); > > > - if (!page) > > > + for (i = 0; page_idx < start + nrpages; page_idx++, i++) { > > > + /* alloc page in bd_inode for reading node summary info */ > > > + pages[i] = grab_cache_page(mapping, page_idx); > > > + if (!pages[i]) > > > break; > > > - > > > - lock_page(page); > > > - page->index = page_idx; > > > - list_add_tail(>lru, pages); > > >
Re: [f2fs-dev] [PATCH] f2fs: avoid overflow when large directory feathure is enabled
Hi, Chao Good catch. Please, modify Documentation/filesytems/f2fs.txt On Tue, May 27, 2014 at 09:06:52AM +0800, Chao Yu wrote: > When large directory feathure is enable, We have one case which could cause > overflow in dir_buckets() as following: > special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2. > > Here we define MAX_DIR_BUCKETS to limit the return value when the condition > could trigger potential overflow. > > Signed-off-by: Chao Yu > --- > fs/f2fs/dir.c |4 ++-- > include/linux/f2fs_fs.h |3 +++ > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > index c3f1485..966acb0 100644 > --- a/fs/f2fs/dir.c > +++ b/fs/f2fs/dir.c > @@ -23,10 +23,10 @@ static unsigned long dir_blocks(struct inode *inode) > > static unsigned int dir_buckets(unsigned int level, int dir_level) > { > - if (level < MAX_DIR_HASH_DEPTH / 2) > + if (level + dir_level < MAX_DIR_HASH_DEPTH / 2) > return 1 << (level + dir_level); > else > - return 1 << ((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1); > + return MAX_DIR_BUCKETS; > } > > static unsigned int bucket_blocks(unsigned int level) > diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h > index 8c03f71..ba6f312 100644 > --- a/include/linux/f2fs_fs.h > +++ b/include/linux/f2fs_fs.h > @@ -394,6 +394,9 @@ typedef __le32f2fs_hash_t; > /* MAX level for dir lookup */ > #define MAX_DIR_HASH_DEPTH 63 > > +/* MAX buckets in one level of dir */ > +#define MAX_DIR_BUCKETS (1 << ((MAX_DIR_HASH_DEPTH / 2) - 1)) > + > #define SIZE_OF_DIR_ENTRY11 /* by byte */ > #define SIZE_OF_DENTRY_BITMAP((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - > 1) / \ > BITS_PER_BYTE) > -- > 1.7.10.4 > > > > -- > The best possible search technologies are now affordable for all companies. > Download your FREE open source Enterprise Search Engine today! > Our experts will assist you in its installation for $59/mo, no commitment. > Test it for FREE on our Cloud platform anytime! > http://pubads.g.doubleclick.net/gampad/clk?id=145328191=/4140/ostg.clktrk > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH] f2fs: avoid overflow when large directory feathure is enabled
Hi, Chao Good catch. Please, modify Documentation/filesytems/f2fs.txt On Tue, May 27, 2014 at 09:06:52AM +0800, Chao Yu wrote: When large directory feathure is enable, We have one case which could cause overflow in dir_buckets() as following: special case: level + dir_level = 32 and level MAX_DIR_HASH_DEPTH / 2. Here we define MAX_DIR_BUCKETS to limit the return value when the condition could trigger potential overflow. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/dir.c |4 ++-- include/linux/f2fs_fs.h |3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index c3f1485..966acb0 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -23,10 +23,10 @@ static unsigned long dir_blocks(struct inode *inode) static unsigned int dir_buckets(unsigned int level, int dir_level) { - if (level MAX_DIR_HASH_DEPTH / 2) + if (level + dir_level MAX_DIR_HASH_DEPTH / 2) return 1 (level + dir_level); else - return 1 ((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1); + return MAX_DIR_BUCKETS; } static unsigned int bucket_blocks(unsigned int level) diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index 8c03f71..ba6f312 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -394,6 +394,9 @@ typedef __le32f2fs_hash_t; /* MAX level for dir lookup */ #define MAX_DIR_HASH_DEPTH 63 +/* MAX buckets in one level of dir */ +#define MAX_DIR_BUCKETS (1 ((MAX_DIR_HASH_DEPTH / 2) - 1)) + #define SIZE_OF_DIR_ENTRY11 /* by byte */ #define SIZE_OF_DENTRY_BITMAP((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \ BITS_PER_BYTE) -- 1.7.10.4 -- The best possible search technologies are now affordable for all companies. Download your FREE open source Enterprise Search Engine today! Our experts will assist you in its installation for $59/mo, no commitment. Test it for FREE on our Cloud platform anytime! http://pubads.g.doubleclick.net/gampad/clk?id=145328191iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Tue, May 27, 2014 at 02:32:57PM +0800, Chao Yu wrote: Hi changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, May 27, 2014 9:25 AM To: Chao Yu Cc: Jaegeuk Kim; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages Hi, Chao Could you think about following once. move node_inode in front of build_segment_manager, then use node_inode instead of bd_inode. Jaegeuk and I discussed this solution previously in [PATCH 3/3 V3] f2fs: introduce f2fs_cache_node_page() to add page into node_inode cache You can see it from this url: http://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/?viewmonth=201312page=5 And it seems not easy to change order of build_*_manager and make node_inode, because there are dependency between them. Sorry to make a mess your patch thread. I've understood it. In your patch, using NAT journal seems to be possible. Anyway, thanks for your answer. On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Changes from V1 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 52 1 file changed, 24 insertions(+), 28 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..02a59e9 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { - struct page *page; - int page_idx = start; + struct inode *inode = sbi-sb-s_bdev-bd_inode; + struct address_space *mapping = inode-i_mapping; + int i, page_idx = start; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO }; - for (; page_idx start + nrpages; page_idx++) { - /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); - if (!page) + for (i = 0; page_idx start + nrpages; page_idx++, i++) { + /* alloc page in bd_inode for reading node summary info */ + pages[i] = grab_cache_page(mapping, page_idx); + if (!pages[i]) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio); } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); - f2fs_submit_merged_bio(sbi, META, READ); - - return page_idx - start; + return i; } int restore_node_summary(struct f2fs_sb_info *sbi, @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct page *page, *tmp; + struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); - int i, last_offset, nrpages, err = 0; - LIST_HEAD(page_list); + struct page *pages[bio_blocks]; + int i, idx, last_offset, nrpages, err = 0; /* scan the node segment */ last_offset = sbi-blocks_per_seg; @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi
Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
Hi, Chao Could you think about following once. move node_inode in front of build_segment_manager, then use node_inode instead of bd_inode. On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote: > Previously we allocate pages with no mapping in ra_sum_pages(), so we may > encounter a crash in event trace of f2fs_submit_page_mbio where we access > mapping data of the page. > > We'd better allocate pages in bd_inode mapping and invalidate these pages > after > we restore data from pages. It could avoid crash in above scenario. > > Changes from V1 > o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. > > Call Trace: > [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] > [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] > [] restore_node_summary+0x13a/0x280 [f2fs] > [] build_curseg+0x2bd/0x620 [f2fs] > [] build_segment_manager+0x1cb/0x920 [f2fs] > [] f2fs_fill_super+0x535/0x8e0 [f2fs] > [] mount_bdev+0x16a/0x1a0 > [] f2fs_mount+0x1f/0x30 [f2fs] > [] mount_fs+0x36/0x170 > [] vfs_kern_mount+0x55/0xe0 > [] do_mount+0x1e8/0x900 > [] SyS_mount+0x82/0xc0 > [] sysenter_do_call+0x12/0x22 > > Suggested-by: Jaegeuk Kim > Signed-off-by: Chao Yu > --- > fs/f2fs/node.c | 52 > 1 file changed, 24 insertions(+), 28 deletions(-) > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 3d60d3d..02a59e9 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, > struct page *page) > > /* > * ra_sum_pages() merge contiguous pages into one bio and submit. > - * these pre-readed pages are linked in pages list. > + * these pre-readed pages are alloced in bd_inode's mapping tree. > */ > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, > int start, int nrpages) > { > - struct page *page; > - int page_idx = start; > + struct inode *inode = sbi->sb->s_bdev->bd_inode; > + struct address_space *mapping = inode->i_mapping; > + int i, page_idx = start; > struct f2fs_io_info fio = { > .type = META, > .rw = READ_SYNC | REQ_META | REQ_PRIO > }; > > - for (; page_idx < start + nrpages; page_idx++) { > - /* alloc temporal page for read node summary info*/ > - page = alloc_page(GFP_F2FS_ZERO); > - if (!page) > + for (i = 0; page_idx < start + nrpages; page_idx++, i++) { > + /* alloc page in bd_inode for reading node summary info */ > + pages[i] = grab_cache_page(mapping, page_idx); > + if (!pages[i]) > break; > - > - lock_page(page); > - page->index = page_idx; > - list_add_tail(>lru, pages); > + f2fs_submit_page_mbio(sbi, pages[i], page_idx, ); > } > > - list_for_each_entry(page, pages, lru) > - f2fs_submit_page_mbio(sbi, page, page->index, ); > - > f2fs_submit_merged_bio(sbi, META, READ); > - > - return page_idx - start; > + return i; > } > > int restore_node_summary(struct f2fs_sb_info *sbi, > @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, > { > struct f2fs_node *rn; > struct f2fs_summary *sum_entry; > - struct page *page, *tmp; > + struct inode *inode = sbi->sb->s_bdev->bd_inode; > block_t addr; > int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); > - int i, last_offset, nrpages, err = 0; > - LIST_HEAD(page_list); > + struct page *pages[bio_blocks]; > + int i, idx, last_offset, nrpages, err = 0; > > /* scan the node segment */ > last_offset = sbi->blocks_per_seg; > @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi, > nrpages = min(last_offset - i, bio_blocks); > > /* read ahead node pages */ > - nrpages = ra_sum_pages(sbi, _list, addr, nrpages); > + nrpages = ra_sum_pages(sbi, pages, addr, nrpages); > if (!nrpages) > return -ENOMEM; > > - list_for_each_entry_safe(page, tmp, _list, lru) { > + for (idx = 0; idx < nrpages; idx++) { > if (err) > goto skip; > > - lock_page(page); > - if (unlikely(!PageUptodate(page))) { > + lock_page(pages[idx]); > + if (unlikely(!PageUptodate(pages[idx]))) { > err = -EIO; > } else { > - rn = F2FS_NODE(page); > + rn = F2FS_NODE(pages[idx]); > sum_entry->nid = rn->footer.nid; > sum_entry->version = 0; >
Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Mon, May 26, 2014 at 02:26:24PM +0800, Chao Yu wrote: > Hi changman, > > > -Original Message- > > From: Changman Lee [mailto:cm224@samsung.com] > > Sent: Friday, May 23, 2014 1:14 PM > > To: Jaegeuk Kim > > Cc: Chao Yu; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > > linux-f2fs-de...@lists.sourceforge.net > > Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace > > f2fs_submit_page_mbio event in > > ra_sum_pages > > > > On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote: > > > Hi Chao, > > > > > > 2014-05-16 (금), 17:14 +0800, Chao Yu: > > > > Previously we allocate pages with no mapping in ra_sum_pages(), so we > > > > may > > > > encounter a crash in event trace of f2fs_submit_page_mbio where we > > > > access > > > > mapping data of the page. > > > > > > > > We'd better allocate pages in bd_inode mapping and invalidate these > > > > pages after > > > > we restore data from pages. It could avoid crash in above scenario. > > > > > > > > Call Trace: > > > > [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] > > > > [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] > > > > [] restore_node_summary+0x13a/0x280 [f2fs] > > > > [] build_curseg+0x2bd/0x620 [f2fs] > > > > [] build_segment_manager+0x1cb/0x920 [f2fs] > > > > [] f2fs_fill_super+0x535/0x8e0 [f2fs] > > > > [] mount_bdev+0x16a/0x1a0 > > > > [] f2fs_mount+0x1f/0x30 [f2fs] > > > > [] mount_fs+0x36/0x170 > > > > [] vfs_kern_mount+0x55/0xe0 > > > > [] do_mount+0x1e8/0x900 > > > > [] SyS_mount+0x82/0xc0 > > > > [] sysenter_do_call+0x12/0x22 > > > > > > > > Signed-off-by: Chao Yu > > > > --- > > > > fs/f2fs/node.c | 49 - > > > > 1 file changed, 28 insertions(+), 21 deletions(-) > > > > > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > > > index 3d60d3d..b5cd814 100644 > > > > --- a/fs/f2fs/node.c > > > > +++ b/fs/f2fs/node.c > > > > @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info > > > > *sbi, struct page *page) > > > > > > > > /* > > > > * ra_sum_pages() merge contiguous pages into one bio and submit. > > > > - * these pre-readed pages are linked in pages list. > > > > + * these pre-readed pages are alloced in bd_inode's mapping tree. > > > > */ > > > > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head > > > > *pages, > > > > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, > > > > int start, int nrpages) > > > > { > > > > struct page *page; > > > > + struct inode *inode = sbi->sb->s_bdev->bd_inode; > > > > How about use sbi->meta_inode instead of bd_inode, then we can do > > caching summary pages for further i/o. > > In my understanding, In ra_sum_pages() we readahead node pages in NODE > segment, > then we could padding current summary caching with nid of node page's footer. > So we should not cache this readaheaded pages in meta_inode's mapping. > Do I miss something? > > Regards > Sorry, you're right. Forget about caching. I've confused ra_sum_pages with summary segments. > > > > > > + struct address_space *mapping = inode->i_mapping; > > > > int page_idx = start; > > > > + int alloced, readed; > > > > struct f2fs_io_info fio = { > > > > .type = META, > > > > .rw = READ_SYNC | REQ_META | REQ_PRIO > > > > @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info > > > > *sbi, struct list_head > > *pages, > > > > > > > > for (; page_idx < start + nrpages; page_idx++) { > > > > /* alloc temporal page for read node summary info*/ > > > > - page = alloc_page(GFP_F2FS_ZERO); > > > > + page = grab_cache_page(mapping, page_idx); > > > > if (!page) > > > > break; > > > > - > > > > - lock_page(page); > > > > - page->index = page_idx; > > > > -
Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Mon, May 26, 2014 at 02:26:24PM +0800, Chao Yu wrote: Hi changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Friday, May 23, 2014 1:14 PM To: Jaegeuk Kim Cc: Chao Yu; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote: Hi Chao, 2014-05-16 (금), 17:14 +0800, Chao Yu: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 49 - 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..b5cd814 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { struct page *page; + struct inode *inode = sbi-sb-s_bdev-bd_inode; How about use sbi-meta_inode instead of bd_inode, then we can do caching summary pages for further i/o. In my understanding, In ra_sum_pages() we readahead node pages in NODE segment, then we could padding current summary caching with nid of node page's footer. So we should not cache this readaheaded pages in meta_inode's mapping. Do I miss something? Regards Sorry, you're right. Forget about caching. I've confused ra_sum_pages with summary segments. + struct address_space *mapping = inode-i_mapping; int page_idx = start; + int alloced, readed; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, for (; page_idx start + nrpages; page_idx++) { /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); + page = grab_cache_page(mapping, page_idx); if (!page) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + page_cache_release(page); IMO, we don't need to do like this. Instead, for() { page = grab_cache_page(); if (!page) break; page[page_idx] = page; f2fs_submit_page_mbio(sbi, page, fio); } f2fs_submit_merged_bio(sbi, META, READ); return page_idx - start; Afterwards, in restore_node_summry(), lock_page() will wait the end_io for read. ... f2fs_put_page(pages[index], 1); Thanks, } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); + alloced = page_idx - start; + readed = find_get_pages_contig(mapping, start, alloced, pages); + BUG_ON(alloced != readed); + + for (page_idx = 0; page_idx readed; page_idx++) + f2fs_submit_page_mbio(sbi, pages[page_idx], + pages[page_idx]-index, fio); f2fs_submit_merged_bio(sbi, META, READ); - return page_idx - start; + return readed
Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
Hi, Chao Could you think about following once. move node_inode in front of build_segment_manager, then use node_inode instead of bd_inode. On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Changes from V1 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 52 1 file changed, 24 insertions(+), 28 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..02a59e9 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { - struct page *page; - int page_idx = start; + struct inode *inode = sbi-sb-s_bdev-bd_inode; + struct address_space *mapping = inode-i_mapping; + int i, page_idx = start; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO }; - for (; page_idx start + nrpages; page_idx++) { - /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); - if (!page) + for (i = 0; page_idx start + nrpages; page_idx++, i++) { + /* alloc page in bd_inode for reading node summary info */ + pages[i] = grab_cache_page(mapping, page_idx); + if (!pages[i]) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio); } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); - f2fs_submit_merged_bio(sbi, META, READ); - - return page_idx - start; + return i; } int restore_node_summary(struct f2fs_sb_info *sbi, @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct page *page, *tmp; + struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); - int i, last_offset, nrpages, err = 0; - LIST_HEAD(page_list); + struct page *pages[bio_blocks]; + int i, idx, last_offset, nrpages, err = 0; /* scan the node segment */ last_offset = sbi-blocks_per_seg; @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi, nrpages = min(last_offset - i, bio_blocks); /* read ahead node pages */ - nrpages = ra_sum_pages(sbi, page_list, addr, nrpages); + nrpages = ra_sum_pages(sbi, pages, addr, nrpages); if (!nrpages) return -ENOMEM; - list_for_each_entry_safe(page, tmp, page_list, lru) { + for (idx = 0; idx nrpages; idx++) { if (err) goto skip; - lock_page(page); - if (unlikely(!PageUptodate(page))) { + lock_page(pages[idx]); + if (unlikely(!PageUptodate(pages[idx]))) { err = -EIO; } else { - rn = F2FS_NODE(page); + rn = F2FS_NODE(pages[idx]); sum_entry-nid = rn-footer.nid; sum_entry-version = 0;
Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote: > Hi Chao, > > 2014-05-16 (금), 17:14 +0800, Chao Yu: > > Previously we allocate pages with no mapping in ra_sum_pages(), so we may > > encounter a crash in event trace of f2fs_submit_page_mbio where we access > > mapping data of the page. > > > > We'd better allocate pages in bd_inode mapping and invalidate these pages > > after > > we restore data from pages. It could avoid crash in above scenario. > > > > Call Trace: > > [] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] > > [] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] > > [] restore_node_summary+0x13a/0x280 [f2fs] > > [] build_curseg+0x2bd/0x620 [f2fs] > > [] build_segment_manager+0x1cb/0x920 [f2fs] > > [] f2fs_fill_super+0x535/0x8e0 [f2fs] > > [] mount_bdev+0x16a/0x1a0 > > [] f2fs_mount+0x1f/0x30 [f2fs] > > [] mount_fs+0x36/0x170 > > [] vfs_kern_mount+0x55/0xe0 > > [] do_mount+0x1e8/0x900 > > [] SyS_mount+0x82/0xc0 > > [] sysenter_do_call+0x12/0x22 > > > > Signed-off-by: Chao Yu > > --- > > fs/f2fs/node.c | 49 - > > 1 file changed, 28 insertions(+), 21 deletions(-) > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > index 3d60d3d..b5cd814 100644 > > --- a/fs/f2fs/node.c > > +++ b/fs/f2fs/node.c > > @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, > > struct page *page) > > > > /* > > * ra_sum_pages() merge contiguous pages into one bio and submit. > > - * these pre-readed pages are linked in pages list. > > + * these pre-readed pages are alloced in bd_inode's mapping tree. > > */ > > -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, > > +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, > > int start, int nrpages) > > { > > struct page *page; > > + struct inode *inode = sbi->sb->s_bdev->bd_inode; How about use sbi->meta_inode instead of bd_inode, then we can do caching summary pages for further i/o. > > + struct address_space *mapping = inode->i_mapping; > > int page_idx = start; > > + int alloced, readed; > > struct f2fs_io_info fio = { > > .type = META, > > .rw = READ_SYNC | REQ_META | REQ_PRIO > > @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, > > struct list_head *pages, > > > > for (; page_idx < start + nrpages; page_idx++) { > > /* alloc temporal page for read node summary info*/ > > - page = alloc_page(GFP_F2FS_ZERO); > > + page = grab_cache_page(mapping, page_idx); > > if (!page) > > break; > > - > > - lock_page(page); > > - page->index = page_idx; > > - list_add_tail(>lru, pages); > > + page_cache_release(page); > > IMO, we don't need to do like this. > Instead, > for() { > page = grab_cache_page(); > if (!page) > break; > page[page_idx] = page; > f2fs_submit_page_mbio(sbi, page, ); > } > f2fs_submit_merged_bio(sbi, META, READ); > return page_idx - start; > > Afterwards, in restore_node_summry(), > lock_page() will wait the end_io for read. > ... > f2fs_put_page(pages[index], 1); > > Thanks, > > > } > > > > - list_for_each_entry(page, pages, lru) > > - f2fs_submit_page_mbio(sbi, page, page->index, ); > > + alloced = page_idx - start; > > + readed = find_get_pages_contig(mapping, start, alloced, pages); > > + BUG_ON(alloced != readed); > > + > > + for (page_idx = 0; page_idx < readed; page_idx++) > > + f2fs_submit_page_mbio(sbi, pages[page_idx], > > + pages[page_idx]->index, ); > > > > f2fs_submit_merged_bio(sbi, META, READ); > > > > - return page_idx - start; > > + return readed; > > } > > > > int restore_node_summary(struct f2fs_sb_info *sbi, > > @@ -1694,11 +1699,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, > > { > > struct f2fs_node *rn; > > struct f2fs_summary *sum_entry; > > - struct page *page, *tmp; > > + struct inode *inode = sbi->sb->s_bdev->bd_inode; > > block_t addr; > > int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); > > - int i, last_offset, nrpages, err = 0; > > - LIST_HEAD(page_list); > > + struct page *pages[bio_blocks]; > > + int i, index, last_offset, nrpages, err = 0; > > > > /* scan the node segment */ > > last_offset = sbi->blocks_per_seg; > > @@ -1709,29 +1714,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi, > > nrpages = min(last_offset - i, bio_blocks); > > > > /* read ahead node pages */ > > - nrpages = ra_sum_pages(sbi, _list, addr, nrpages); > > + nrpages = ra_sum_pages(sbi, pages, addr, nrpages); > > if (!nrpages) > > return -ENOMEM; > > >
Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote: Hi Chao, 2014-05-16 (금), 17:14 +0800, Chao Yu: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 49 - 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..b5cd814 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { struct page *page; + struct inode *inode = sbi-sb-s_bdev-bd_inode; How about use sbi-meta_inode instead of bd_inode, then we can do caching summary pages for further i/o. + struct address_space *mapping = inode-i_mapping; int page_idx = start; + int alloced, readed; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, for (; page_idx start + nrpages; page_idx++) { /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); + page = grab_cache_page(mapping, page_idx); if (!page) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + page_cache_release(page); IMO, we don't need to do like this. Instead, for() { page = grab_cache_page(); if (!page) break; page[page_idx] = page; f2fs_submit_page_mbio(sbi, page, fio); } f2fs_submit_merged_bio(sbi, META, READ); return page_idx - start; Afterwards, in restore_node_summry(), lock_page() will wait the end_io for read. ... f2fs_put_page(pages[index], 1); Thanks, } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); + alloced = page_idx - start; + readed = find_get_pages_contig(mapping, start, alloced, pages); + BUG_ON(alloced != readed); + + for (page_idx = 0; page_idx readed; page_idx++) + f2fs_submit_page_mbio(sbi, pages[page_idx], + pages[page_idx]-index, fio); f2fs_submit_merged_bio(sbi, META, READ); - return page_idx - start; + return readed; } int restore_node_summary(struct f2fs_sb_info *sbi, @@ -1694,11 +1699,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct page *page, *tmp; + struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); - int i, last_offset, nrpages, err = 0; - LIST_HEAD(page_list); + struct page *pages[bio_blocks]; + int i, index, last_offset, nrpages, err = 0; /* scan the node segment */ last_offset = sbi-blocks_per_seg; @@ -1709,29 +1714,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi, nrpages = min(last_offset - i, bio_blocks); /* read ahead node pages */ - nrpages = ra_sum_pages(sbi, page_list, addr, nrpages); + nrpages = ra_sum_pages(sbi, pages, addr, nrpages); if (!nrpages) return -ENOMEM; - list_for_each_entry_safe(page, tmp, page_list, lru) { + for (index = 0; index nrpages;
Re: [f2fs-dev] [PATCH 5/5] f2fs: add a wait queue to avoid unnecessary, build_free_nid
On 금, 2014-03-07 at 18:43 +0800, Gu Zheng wrote: > Previously, when we try to alloc free nid while the build free nid > is going, the allocer will be run into the flow that waiting for > "nm_i->build_lock", see following: > /* We should not use stale free nids created by build_free_nids */ > > if (nm_i->fcnt && !on_build_free_nids(nm_i)) { > f2fs_bug_on(list_empty(_i->free_nid_list)); > list_for_each(this, _i->free_nid_list) { > i = list_entry(this, struct free_nid, list); > if (i->state == NID_NEW) > break; > } > > f2fs_bug_on(i->state != NID_NEW); > *nid = i->nid; > i->state = NID_ALLOC; > nm_i->fcnt--; > spin_unlock(_i->free_nid_list_lock); > return true; > } > spin_unlock(_i->free_nid_list_lock); > > /* Let's scan nat pages and its caches to get free nids */ > > mutex_lock(_i->build_lock); > build_free_nids(sbi); > mutex_unlock(_i->build_lock); > and this will cause another unnecessary building free nid if the current > building free nid job is done. > So here we introduce a wait_queue to avoid this issue. > > Signed-off-by: Gu Zheng > --- > fs/f2fs/f2fs.h |1 + > fs/f2fs/node.c | 10 +- > 2 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index f845e92..7ae193e 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -256,6 +256,7 @@ struct f2fs_nm_info { > spinlock_t free_nid_list_lock; /* protect free nid list */ > unsigned int fcnt; /* the number of free node id */ > struct mutex build_lock;/* lock for build free nids */ > + wait_queue_head_t build_wq; /* wait queue for build free nids */ > > /* for checkpoint */ > char *nat_bitmap; /* NAT bitmap pointer */ > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 4b7861d..ab44711 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1422,7 +1422,13 @@ retry: > spin_lock(_i->free_nid_list_lock); > > /* We should not use stale free nids created by build_free_nids */ > - if (nm_i->fcnt && !on_build_free_nids(nm_i)) { > + if (on_build_free_nids(nm_i)) { > + spin_unlock(_i->free_nid_list_lock); > + wait_event(nm_i->build_wq, !on_build_free_nids(nm_i)); > + goto retry; > + } > + It would be better moving spin_lock(free_nid_list_lock) here after removing above spin_unlock(). > + if (nm_i->fcnt) { > f2fs_bug_on(list_empty(_i->free_nid_list)); > list_for_each(this, _i->free_nid_list) { > i = list_entry(this, struct free_nid, list); > @@ -1443,6 +1449,7 @@ retry: > mutex_lock(_i->build_lock); > build_free_nids(sbi); > mutex_unlock(_i->build_lock); > + wake_up_all(_i->build_wq); > goto retry; > } > > @@ -1813,6 +1820,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi) > INIT_LIST_HEAD(_i->dirty_nat_entries); > > mutex_init(_i->build_lock); > + init_waitqueue_head(_i->build_wq); > spin_lock_init(_i->free_nid_list_lock); > rwlock_init(_i->nat_tree_lock); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 5/5] f2fs: add a wait queue to avoid unnecessary, build_free_nid
On 금, 2014-03-07 at 18:43 +0800, Gu Zheng wrote: Previously, when we try to alloc free nid while the build free nid is going, the allocer will be run into the flow that waiting for nm_i-build_lock, see following: /* We should not use stale free nids created by build_free_nids */ if (nm_i-fcnt !on_build_free_nids(nm_i)) { f2fs_bug_on(list_empty(nm_i-free_nid_list)); list_for_each(this, nm_i-free_nid_list) { i = list_entry(this, struct free_nid, list); if (i-state == NID_NEW) break; } f2fs_bug_on(i-state != NID_NEW); *nid = i-nid; i-state = NID_ALLOC; nm_i-fcnt--; spin_unlock(nm_i-free_nid_list_lock); return true; } spin_unlock(nm_i-free_nid_list_lock); /* Let's scan nat pages and its caches to get free nids */ mutex_lock(nm_i-build_lock); build_free_nids(sbi); mutex_unlock(nm_i-build_lock); and this will cause another unnecessary building free nid if the current building free nid job is done. So here we introduce a wait_queue to avoid this issue. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com --- fs/f2fs/f2fs.h |1 + fs/f2fs/node.c | 10 +- 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index f845e92..7ae193e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -256,6 +256,7 @@ struct f2fs_nm_info { spinlock_t free_nid_list_lock; /* protect free nid list */ unsigned int fcnt; /* the number of free node id */ struct mutex build_lock;/* lock for build free nids */ + wait_queue_head_t build_wq; /* wait queue for build free nids */ /* for checkpoint */ char *nat_bitmap; /* NAT bitmap pointer */ diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 4b7861d..ab44711 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1422,7 +1422,13 @@ retry: spin_lock(nm_i-free_nid_list_lock); /* We should not use stale free nids created by build_free_nids */ - if (nm_i-fcnt !on_build_free_nids(nm_i)) { + if (on_build_free_nids(nm_i)) { + spin_unlock(nm_i-free_nid_list_lock); + wait_event(nm_i-build_wq, !on_build_free_nids(nm_i)); + goto retry; + } + It would be better moving spin_lock(free_nid_list_lock) here after removing above spin_unlock(). + if (nm_i-fcnt) { f2fs_bug_on(list_empty(nm_i-free_nid_list)); list_for_each(this, nm_i-free_nid_list) { i = list_entry(this, struct free_nid, list); @@ -1443,6 +1449,7 @@ retry: mutex_lock(nm_i-build_lock); build_free_nids(sbi); mutex_unlock(nm_i-build_lock); + wake_up_all(nm_i-build_wq); goto retry; } @@ -1813,6 +1820,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(nm_i-dirty_nat_entries); mutex_init(nm_i-build_lock); + init_waitqueue_head(nm_i-build_wq); spin_lock_init(nm_i-free_nid_list_lock); rwlock_init(nm_i-nat_tree_lock); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 2/4] f2fs: handle dirty segments inside refresh_sit_entry
Hi, I found some redundant code in your patch. I think that locate_dirty_segment(sbi, old_cursegno) equals to locate_dirty_segment(sbi, GET_SEGNO(sbi, new)) in refresh_sit_entry. Because *new_blkaddr is a block belonging to old_cursegno. How do you think? On 화, 2014-01-28 at 14:54 +0900, Jaegeuk Kim wrote: > This patch cleans up the refresh_sit_entry to handle locate_dirty_segments. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/f2fs.h| 1 + > fs/f2fs/segment.c | 19 --- > 2 files changed, 9 insertions(+), 11 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 42903c3..6e9515d 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1132,6 +1132,7 @@ void destroy_node_manager_caches(void); > void f2fs_balance_fs(struct f2fs_sb_info *); > void f2fs_balance_fs_bg(struct f2fs_sb_info *); > void invalidate_blocks(struct f2fs_sb_info *, block_t); > +void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t); > void clear_prefree_segments(struct f2fs_sb_info *); > int npages_for_summary_flush(struct f2fs_sb_info *); > void allocate_new_segments(struct f2fs_sb_info *); > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 7caac5f..89aa503 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -434,12 +434,14 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, > block_t blkaddr, int del) > get_sec_entry(sbi, segno)->valid_blocks += del; > } > > -static void refresh_sit_entry(struct f2fs_sb_info *sbi, > - block_t old_blkaddr, block_t new_blkaddr) > +void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new) > { > - update_sit_entry(sbi, new_blkaddr, 1); > - if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) > - update_sit_entry(sbi, old_blkaddr, -1); > + update_sit_entry(sbi, new, 1); > + if (GET_SEGNO(sbi, old) != NULL_SEGNO) > + update_sit_entry(sbi, old, -1); > + > + locate_dirty_segment(sbi, GET_SEGNO(sbi, old)); > + locate_dirty_segment(sbi, GET_SEGNO(sbi, new)); > } > > void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr) > @@ -886,12 +888,11 @@ void allocate_data_block(struct f2fs_sb_info *sbi, > struct page *page, >* since SSR needs latest valid block information. >*/ > refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr); > + locate_dirty_segment(sbi, old_cursegno); > > if (!__has_curseg_space(sbi, type)) > sit_i->s_ops->allocate_segment(sbi, type, false); > > - locate_dirty_segment(sbi, old_cursegno); > - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr)); > mutex_unlock(_i->sentry_lock); > > if (page && IS_NODESEG(type)) > @@ -992,9 +993,7 @@ void recover_data_page(struct f2fs_sb_info *sbi, > __add_sum_entry(sbi, type, sum); > > refresh_sit_entry(sbi, old_blkaddr, new_blkaddr); > - > locate_dirty_segment(sbi, old_cursegno); > - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr)); > > mutex_unlock(_i->sentry_lock); > mutex_unlock(>curseg_mutex); > @@ -1045,9 +1044,7 @@ void rewrite_node_page(struct f2fs_sb_info *sbi, > f2fs_submit_page_mbio(sbi, page, new_blkaddr, ); > f2fs_submit_merged_bio(sbi, NODE, WRITE); > refresh_sit_entry(sbi, old_blkaddr, new_blkaddr); > - > locate_dirty_segment(sbi, old_cursegno); > - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr)); > > mutex_unlock(_i->sentry_lock); > mutex_unlock(>curseg_mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 2/4] f2fs: handle dirty segments inside refresh_sit_entry
Hi, I found some redundant code in your patch. I think that locate_dirty_segment(sbi, old_cursegno) equals to locate_dirty_segment(sbi, GET_SEGNO(sbi, new)) in refresh_sit_entry. Because *new_blkaddr is a block belonging to old_cursegno. How do you think? On 화, 2014-01-28 at 14:54 +0900, Jaegeuk Kim wrote: This patch cleans up the refresh_sit_entry to handle locate_dirty_segments. Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com --- fs/f2fs/f2fs.h| 1 + fs/f2fs/segment.c | 19 --- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 42903c3..6e9515d 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1132,6 +1132,7 @@ void destroy_node_manager_caches(void); void f2fs_balance_fs(struct f2fs_sb_info *); void f2fs_balance_fs_bg(struct f2fs_sb_info *); void invalidate_blocks(struct f2fs_sb_info *, block_t); +void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t); void clear_prefree_segments(struct f2fs_sb_info *); int npages_for_summary_flush(struct f2fs_sb_info *); void allocate_new_segments(struct f2fs_sb_info *); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 7caac5f..89aa503 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -434,12 +434,14 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) get_sec_entry(sbi, segno)-valid_blocks += del; } -static void refresh_sit_entry(struct f2fs_sb_info *sbi, - block_t old_blkaddr, block_t new_blkaddr) +void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new) { - update_sit_entry(sbi, new_blkaddr, 1); - if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) - update_sit_entry(sbi, old_blkaddr, -1); + update_sit_entry(sbi, new, 1); + if (GET_SEGNO(sbi, old) != NULL_SEGNO) + update_sit_entry(sbi, old, -1); + + locate_dirty_segment(sbi, GET_SEGNO(sbi, old)); + locate_dirty_segment(sbi, GET_SEGNO(sbi, new)); } void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr) @@ -886,12 +888,11 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, * since SSR needs latest valid block information. */ refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr); + locate_dirty_segment(sbi, old_cursegno); if (!__has_curseg_space(sbi, type)) sit_i-s_ops-allocate_segment(sbi, type, false); - locate_dirty_segment(sbi, old_cursegno); - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr)); mutex_unlock(sit_i-sentry_lock); if (page IS_NODESEG(type)) @@ -992,9 +993,7 @@ void recover_data_page(struct f2fs_sb_info *sbi, __add_sum_entry(sbi, type, sum); refresh_sit_entry(sbi, old_blkaddr, new_blkaddr); - locate_dirty_segment(sbi, old_cursegno); - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr)); mutex_unlock(sit_i-sentry_lock); mutex_unlock(curseg-curseg_mutex); @@ -1045,9 +1044,7 @@ void rewrite_node_page(struct f2fs_sb_info *sbi, f2fs_submit_page_mbio(sbi, page, new_blkaddr, fio); f2fs_submit_merged_bio(sbi, NODE, WRITE); refresh_sit_entry(sbi, old_blkaddr, new_blkaddr); - locate_dirty_segment(sbi, old_cursegno); - locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr)); mutex_unlock(sit_i-sentry_lock); mutex_unlock(curseg-curseg_mutex); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
As you know, if any data or function are used once, we can use some keywords like __initdata for data and __init for function. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 7:52 PM To: 'Changman Lee'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Hi Lee, > -Original Message- > From: Changman Lee [mailto:cm224@samsung.com] > Sent: Tuesday, October 29, 2013 3:36 PM > To: 'Chao Yu'; jaegeuk@samsung.com > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-f2fs-de...@lists.sourceforge.net > Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or > zeros bitmap > with bitops for better mount performance > > Review attached patch, please. Could we hide the pre calculated value by generating it in allocated memory by func, because the value will be no use after build_sit_entries(); Regards Yu > > -Original Message- > From: Chao Yu [mailto:chao2...@samsung.com] > Sent: Tuesday, October 29, 2013 3:51 PM > To: jaegeuk@samsung.com > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-f2fs-de...@lists.sourceforge.net > Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros > bitmap with > bitops for better mount performance > > Previously, check_block_count check valid_map with bit data type in > common scenario that sit has all ones or zeros bitmap, it makes low > mount performance. > So let's check the special bitmap with integer data type instead of > the bit one. > > v1-->v2: > use find_next_{zero_}bit_le for better performance and readable as > Jaegeuk suggested. > use neat logogram in comment as Gu Zheng suggested. > search continuous ones or zeros for better performance when checking > mixed bitmap. > > Suggested-by: Jaegeuk Kim > Signed-off-by: Shu Tan > Signed-off-by: Chao Yu > --- > fs/f2fs/segment.h | 19 +++ > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index > abe7094..a7abfa8 > 100644 > --- a/fs/f2fs/segment.h > +++ b/fs/f2fs/segment.h > @@ -550,8 +550,9 @@ static inline void check_block_count(struct > f2fs_sb_info *sbi, { > struct f2fs_sm_info *sm_info = SM_I(sbi); > unsigned int end_segno = sm_info->segment_count - 1; > + bool is_valid = test_bit_le(0, raw_sit->valid_map) ? true : false; > int valid_blocks = 0; > - int i; > + int cur_pos = 0, next_pos; > > /* check segment usage */ > BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg); @@ -560,9 > +561,19 @@ static inline void check_block_count(struct f2fs_sb_info > +*sbi, > BUG_ON(segno > end_segno); > > /* check bitmap with valid block count */ > - for (i = 0; i < sbi->blocks_per_seg; i++) > - if (f2fs_test_bit(i, raw_sit->valid_map)) > - valid_blocks++; > + do { > + if (is_valid) { > + next_pos = > find_next_zero_bit_le(_sit->valid_map, > + sbi->blocks_per_seg, > + cur_pos); > + valid_blocks += next_pos - cur_pos; > + } else > + next_pos = find_next_bit_le(_sit->valid_map, > + sbi->blocks_per_seg, > + cur_pos); > + cur_pos = next_pos; > + is_valid = !is_valid; > + } while (cur_pos < sbi->blocks_per_seg); > BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } > > -- > 1.7.9.5 > > > > -- > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help > keep Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951=/4140/ostg.c > lktr > k > ___ > Linux-f2fs-devel mailing list > linux-f2fs-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
Firstly, Thanks. You're right. And I don't know it would be optimized but considering pipeline. for ( i =0; i < SIT_VBLOCK_MAP_SIZE; i += 4) { valid_blocks += bit_count_byte(raw_sit->valid_map[i]; valid_blocks += bit_count_byte(raw_sit->valid_map[i+1]; valid_blocks += bit_count_byte(raw_sit->valid_map[i+2]; valid_blocks += bit_count_byte(raw_sit->valid_map[i+3]; } Secondly, I think also your patch is good in lots of case NOT aging for long time. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 7:07 PM To: 'Changman Lee'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Hi Lee, It's a good point. Firstly, In your patch: /* check bitmap with valid block count */ for (i = 0; i < sbi->blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit->valid_map)) - valid_blocks++; + valid_blocks += bit_count_byte(raw_sit->valid_map[i]); + BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } for (i = 0; i < sbi->blocks_per_seg; i++) should be replace with for (i = 0; i < SIT_VBLOCK_MAP_SIZE; i++) Secondly, I tested your patch and mine with SD and emmc with all zeros bitmap. It shows my patch takes litter time. Could you test and compare the performance of two patches. -- 1.7.10.4 > -Original Message- > From: Changman Lee [mailto:cm224@samsung.com] > Sent: Tuesday, October 29, 2013 3:36 PM > To: 'Chao Yu'; jaegeuk@samsung.com > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-f2fs-de...@lists.sourceforge.net > Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap > with bitops for better mount performance > > Review attached patch, please. > > -Original Message- > From: Chao Yu [mailto:chao2...@samsung.com] > Sent: Tuesday, October 29, 2013 3:51 PM > To: jaegeuk@samsung.com > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-f2fs-de...@lists.sourceforge.net > Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with > bitops for better mount performance > > Previously, check_block_count check valid_map with bit data type in common > scenario that sit has all ones or zeros bitmap, it makes low mount > performance. > So let's check the special bitmap with integer data type instead of the bit one. > > v1-->v2: > use find_next_{zero_}bit_le for better performance and readable as > Jaegeuk suggested. > use neat logogram in comment as Gu Zheng suggested. > search continuous ones or zeros for better performance when checking > mixed bitmap. > > Suggested-by: Jaegeuk Kim > Signed-off-by: Shu Tan > Signed-off-by: Chao Yu > --- > fs/f2fs/segment.h | 19 +++ > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 > 100644 > --- a/fs/f2fs/segment.h > +++ b/fs/f2fs/segment.h > @@ -550,8 +550,9 @@ static inline void check_block_count(struct > f2fs_sb_info *sbi, { > struct f2fs_sm_info *sm_info = SM_I(sbi); > unsigned int end_segno = sm_info->segment_count - 1; > + bool is_valid = test_bit_le(0, raw_sit->valid_map) ? true : false; > int valid_blocks = 0; > - int i; > + int cur_pos = 0, next_pos; > > /* check segment usage */ > BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg); @@ -560,9 > +561,19 @@ static inline void check_block_count(struct f2fs_sb_info > +*sbi, > BUG_ON(segno > end_segno); > > /* check bitmap with valid block count */ > - for (i = 0; i < sbi->blocks_per_seg; i++) > - if (f2fs_test_bit(i, raw_sit->valid_map)) > - valid_blocks++; > + do { > + if (is_valid) { > + next_pos = > find_next_zero_bit_le(_sit->valid_map, > + sbi->blocks_per_seg, > + cur_pos); > + valid_blocks += next_pos - cur_pos; > + } else > + next_pos = find_next_bit_le(_sit->valid_map, > + sbi->blocks_per_seg, > + cur_pos); > + cur_pos = next_pos; > + is_valid = !is_valid; > + } while (cur_pos < sbi->blocks_per_seg); > BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } > > -- > 1.7.9.5 > >
RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
Review attached patch, please. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 3:51 PM To: jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Previously, check_block_count check valid_map with bit data type in common scenario that sit has all ones or zeros bitmap, it makes low mount performance. So let's check the special bitmap with integer data type instead of the bit one. v1-->v2: use find_next_{zero_}bit_le for better performance and readable as Jaegeuk suggested. use neat logogram in comment as Gu Zheng suggested. search continuous ones or zeros for better performance when checking mixed bitmap. Suggested-by: Jaegeuk Kim Signed-off-by: Shu Tan Signed-off-by: Chao Yu --- fs/f2fs/segment.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, { struct f2fs_sm_info *sm_info = SM_I(sbi); unsigned int end_segno = sm_info->segment_count - 1; + bool is_valid = test_bit_le(0, raw_sit->valid_map) ? true : false; int valid_blocks = 0; - int i; + int cur_pos = 0, next_pos; /* check segment usage */ BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg); @@ -560,9 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, BUG_ON(segno > end_segno); /* check bitmap with valid block count */ - for (i = 0; i < sbi->blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit->valid_map)) - valid_blocks++; + do { + if (is_valid) { + next_pos = find_next_zero_bit_le(_sit->valid_map, + sbi->blocks_per_seg, + cur_pos); + valid_blocks += next_pos - cur_pos; + } else + next_pos = find_next_bit_le(_sit->valid_map, + sbi->blocks_per_seg, + cur_pos); + cur_pos = next_pos; + is_valid = !is_valid; + } while (cur_pos < sbi->blocks_per_seg); BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } -- 1.7.9.5 -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel 0001-f2fs-use-pre-calculated-value-to-get-sum-of-valid-bl.patch Description: Binary data
RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
Review attached patch, please. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 3:51 PM To: jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Previously, check_block_count check valid_map with bit data type in common scenario that sit has all ones or zeros bitmap, it makes low mount performance. So let's check the special bitmap with integer data type instead of the bit one. v1--v2: use find_next_{zero_}bit_le for better performance and readable as Jaegeuk suggested. use neat logogram in comment as Gu Zheng suggested. search continuous ones or zeros for better performance when checking mixed bitmap. Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Shu Tan shu@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/segment.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, { struct f2fs_sm_info *sm_info = SM_I(sbi); unsigned int end_segno = sm_info-segment_count - 1; + bool is_valid = test_bit_le(0, raw_sit-valid_map) ? true : false; int valid_blocks = 0; - int i; + int cur_pos = 0, next_pos; /* check segment usage */ BUG_ON(GET_SIT_VBLOCKS(raw_sit) sbi-blocks_per_seg); @@ -560,9 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, BUG_ON(segno end_segno); /* check bitmap with valid block count */ - for (i = 0; i sbi-blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit-valid_map)) - valid_blocks++; + do { + if (is_valid) { + next_pos = find_next_zero_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + valid_blocks += next_pos - cur_pos; + } else + next_pos = find_next_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + cur_pos = next_pos; + is_valid = !is_valid; + } while (cur_pos sbi-blocks_per_seg); BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } -- 1.7.9.5 -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel 0001-f2fs-use-pre-calculated-value-to-get-sum-of-valid-bl.patch Description: Binary data
RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
Firstly, Thanks. You're right. And I don't know it would be optimized but considering pipeline. for ( i =0; i SIT_VBLOCK_MAP_SIZE; i += 4) { valid_blocks += bit_count_byte(raw_sit-valid_map[i]; valid_blocks += bit_count_byte(raw_sit-valid_map[i+1]; valid_blocks += bit_count_byte(raw_sit-valid_map[i+2]; valid_blocks += bit_count_byte(raw_sit-valid_map[i+3]; } Secondly, I think also your patch is good in lots of case NOT aging for long time. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 7:07 PM To: 'Changman Lee'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Hi Lee, It's a good point. Firstly, In your patch: /* check bitmap with valid block count */ for (i = 0; i sbi-blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit-valid_map)) - valid_blocks++; + valid_blocks += bit_count_byte(raw_sit-valid_map[i]); + BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } for (i = 0; i sbi-blocks_per_seg; i++) should be replace with for (i = 0; i SIT_VBLOCK_MAP_SIZE; i++) Secondly, I tested your patch and mine with SD and emmc with all zeros bitmap. It shows my patch takes litter time. Could you test and compare the performance of two patches. -- 1.7.10.4 -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, October 29, 2013 3:36 PM To: 'Chao Yu'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Review attached patch, please. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 3:51 PM To: jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Previously, check_block_count check valid_map with bit data type in common scenario that sit has all ones or zeros bitmap, it makes low mount performance. So let's check the special bitmap with integer data type instead of the bit one. v1--v2: use find_next_{zero_}bit_le for better performance and readable as Jaegeuk suggested. use neat logogram in comment as Gu Zheng suggested. search continuous ones or zeros for better performance when checking mixed bitmap. Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Shu Tan shu@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/segment.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, { struct f2fs_sm_info *sm_info = SM_I(sbi); unsigned int end_segno = sm_info-segment_count - 1; + bool is_valid = test_bit_le(0, raw_sit-valid_map) ? true : false; int valid_blocks = 0; - int i; + int cur_pos = 0, next_pos; /* check segment usage */ BUG_ON(GET_SIT_VBLOCKS(raw_sit) sbi-blocks_per_seg); @@ -560,9 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info +*sbi, BUG_ON(segno end_segno); /* check bitmap with valid block count */ - for (i = 0; i sbi-blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit-valid_map)) - valid_blocks++; + do { + if (is_valid) { + next_pos = find_next_zero_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + valid_blocks += next_pos - cur_pos; + } else + next_pos = find_next_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + cur_pos = next_pos; + is_valid = !is_valid; + } while (cur_pos sbi-blocks_per_seg); BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } -- 1.7.9.5 -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net
RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
As you know, if any data or function are used once, we can use some keywords like __initdata for data and __init for function. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 7:52 PM To: 'Changman Lee'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Hi Lee, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, October 29, 2013 3:36 PM To: 'Chao Yu'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Review attached patch, please. Could we hide the pre calculated value by generating it in allocated memory by func, because the value will be no use after build_sit_entries(); Regards Yu -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 3:51 PM To: jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Previously, check_block_count check valid_map with bit data type in common scenario that sit has all ones or zeros bitmap, it makes low mount performance. So let's check the special bitmap with integer data type instead of the bit one. v1--v2: use find_next_{zero_}bit_le for better performance and readable as Jaegeuk suggested. use neat logogram in comment as Gu Zheng suggested. search continuous ones or zeros for better performance when checking mixed bitmap. Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Shu Tan shu@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/segment.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, { struct f2fs_sm_info *sm_info = SM_I(sbi); unsigned int end_segno = sm_info-segment_count - 1; + bool is_valid = test_bit_le(0, raw_sit-valid_map) ? true : false; int valid_blocks = 0; - int i; + int cur_pos = 0, next_pos; /* check segment usage */ BUG_ON(GET_SIT_VBLOCKS(raw_sit) sbi-blocks_per_seg); @@ -560,9 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info +*sbi, BUG_ON(segno end_segno); /* check bitmap with valid block count */ - for (i = 0; i sbi-blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit-valid_map)) - valid_blocks++; + do { + if (is_valid) { + next_pos = find_next_zero_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + valid_blocks += next_pos - cur_pos; + } else + next_pos = find_next_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + cur_pos = next_pos; + is_valid = !is_valid; + } while (cur_pos sbi-blocks_per_seg); BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } -- 1.7.9.5 -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.c lktr k ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
On 금, 2013-06-14 at 13:20 +0900, Namjae Jeon wrote: > 2013/6/11, Namjae Jeon : > > 2013/6/11, Changman Lee : > >> On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote: > >>> 2013/6/10, Changman Lee : > >>> > Hello, Namjae > >>> Hi. Changman. > >>> > > >>> > If using ACL, whenever i_mode is changed we should update acl_mode > >>> > which > >>> > is written to xattr block, too. And vice versa. > >>> > Because update_inode() is called at any reason and anytime, so we > >>> > should > >>> > sync both the moment xattr is written. > >>> > We don't hope that only i_mode is written to disk and xattr is not. So > >>> > f2fs_setattr is dirty. > >>> Yes, agreed this could be issue. > >>> > > >>> > And, below code has a bug. When error is occurred, inode->i_mode > >>> > shouldn't be changed. Please, check one more time, Namjae. > >>> And, below code has a bug. When error is occurred, inode->i_mode > >>> shouldn't be changed. Please, check one more time, Namjae. > >>> > >>> This was part of the default code, when ‘acl’ is not set for file’ > >>> Then, inode should be updated by these conditions (i.e., it covers the > >>> ‘chmod’ and ‘setacl’ scenario). > >>> When ACL is not present on the file and ‘chmod’ is done, then mode is > >>> changed from this part, as f2fs_get_acl() will fail and cause the > >>> below code to be executed: > >>> > >>> if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { > >>> inode->i_mode = fi->i_acl_mode; > >>> clear_inode_flag(fi, FI_ACL_MODE); > >>> } > >>> > >>> Now, in order to make it consistent and work on all scenario we need > >>> to make further change like this in addition to the patch changes. > >>> setattr_copy(inode, attr); > >>> if (attr->ia_valid & ATTR_MODE) { > >>> + set_acl_inode(fi, inode->i_mode); > >>> err = f2fs_acl_chmod(inode); > >>> if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { > >>> > >>> Let me know your opinion. > >>> Thanks. > >>> > >> > >> setattr_copy changes inode->i_mode, this is not our expectation. > >> So I made redundant __setatt_copy that copy attr->mode to > >> fi->i_acl_mode. > >> When acl_mode is reflected in xattr, acl_mode is copied to > >> inode->i_mode. > >> > >> Agree? > >> > > Hi Changman. > > > > First, Sorry for interrupt. > > I think that inode i_mode should be updated regardless of f2fs_acl_chmod. > > Actually I am still not understand the reason why we should use > > temporarily acl mode(i_acl_mode). > > I wroted the v2 patch to not use i_acl_mode like this. > > Am I missing something ? > To Changman, > I am still waiting for your reply. Correct us if we are wrong or > missing something. > > Hi Jaegeuk, > Could you please share your views on this? > > Thanks. Sorry for late. I was very busy. Could you tell me if it happens difference between xattr and i_mode, what will you do? The purpose of i_acl_mode is used to update i_mode and xattr together in same lock region. > > > > > > > Subject: [PATCH v2] f2fs: reorganize the f2fs_setattr(), f2fs_set_acl, > > f2fs_setxattr() > > From: Namjae Jeon > > > > Remove the redundant code from f2fs_setattr() function and make it aligned > > with usages of generic vfs layer function e.g using the setattr_copy() > > instead of using the f2fs specific function. > > > > Also correct the condition for updating the size of file via > > truncate_setsize(). > > > > Also modify the code of f2fs_set_acl and f2fs_setxattr for removing the > > redundant code & add the required changes to correct the requested > > operations. > > > > Remove the variable "i_acl_mode" from the f2fs_inode_info struct since > > i_mode will > > hold the latest 'mode' value which can be used for any further > > references. And in > > order to make 'chmod' work without ACL support, inode i_mode should be > > first > > updated correctly. > > > > Remove the helper functions to access and set the i_acl_mode. > > > > Signed-off
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
On 금, 2013-06-14 at 13:20 +0900, Namjae Jeon wrote: 2013/6/11, Namjae Jeon linkinj...@gmail.com: 2013/6/11, Changman Lee cm224@samsung.com: On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote: 2013/6/10, Changman Lee cm224@samsung.com: Hello, Namjae Hi. Changman. If using ACL, whenever i_mode is changed we should update acl_mode which is written to xattr block, too. And vice versa. Because update_inode() is called at any reason and anytime, so we should sync both the moment xattr is written. We don't hope that only i_mode is written to disk and xattr is not. So f2fs_setattr is dirty. Yes, agreed this could be issue. And, below code has a bug. When error is occurred, inode-i_mode shouldn't be changed. Please, check one more time, Namjae. And, below code has a bug. When error is occurred, inode-i_mode shouldn't be changed. Please, check one more time, Namjae. This was part of the default code, when ‘acl’ is not set for file’ Then, inode should be updated by these conditions (i.e., it covers the ‘chmod’ and ‘setacl’ scenario). When ACL is not present on the file and ‘chmod’ is done, then mode is changed from this part, as f2fs_get_acl() will fail and cause the below code to be executed: if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { inode-i_mode = fi-i_acl_mode; clear_inode_flag(fi, FI_ACL_MODE); } Now, in order to make it consistent and work on all scenario we need to make further change like this in addition to the patch changes. setattr_copy(inode, attr); if (attr-ia_valid ATTR_MODE) { + set_acl_inode(fi, inode-i_mode); err = f2fs_acl_chmod(inode); if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { Let me know your opinion. Thanks. setattr_copy changes inode-i_mode, this is not our expectation. So I made redundant __setatt_copy that copy attr-mode to fi-i_acl_mode. When acl_mode is reflected in xattr, acl_mode is copied to inode-i_mode. Agree? Hi Changman. First, Sorry for interrupt. I think that inode i_mode should be updated regardless of f2fs_acl_chmod. Actually I am still not understand the reason why we should use temporarily acl mode(i_acl_mode). I wroted the v2 patch to not use i_acl_mode like this. Am I missing something ? To Changman, I am still waiting for your reply. Correct us if we are wrong or missing something. Hi Jaegeuk, Could you please share your views on this? Thanks. Sorry for late. I was very busy. Could you tell me if it happens difference between xattr and i_mode, what will you do? The purpose of i_acl_mode is used to update i_mode and xattr together in same lock region. Subject: [PATCH v2] f2fs: reorganize the f2fs_setattr(), f2fs_set_acl, f2fs_setxattr() From: Namjae Jeon namjae.j...@samsung.com Remove the redundant code from f2fs_setattr() function and make it aligned with usages of generic vfs layer function e.g using the setattr_copy() instead of using the f2fs specific function. Also correct the condition for updating the size of file via truncate_setsize(). Also modify the code of f2fs_set_acl and f2fs_setxattr for removing the redundant code add the required changes to correct the requested operations. Remove the variable i_acl_mode from the f2fs_inode_info struct since i_mode will hold the latest 'mode' value which can be used for any further references. And in order to make 'chmod' work without ACL support, inode i_mode should be first updated correctly. Remove the helper functions to access and set the i_acl_mode. Signed-off-by: Namjae Jeon namjae.j...@samsung.com Signed-off-by: Pankaj Kumar pankaj...@samsung.com --- fs/f2fs/acl.c | 23 +-- fs/f2fs/f2fs.h | 17 - fs/f2fs/file.c | 48 ++-- fs/f2fs/xattr.c |9 ++--- 4 files changed, 17 insertions(+), 80 deletions(-) diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 44abc2f..7ebddf1 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c @@ -17,9 +17,6 @@ #include xattr.h #include acl.h -#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \ - (F2FS_I(i)-i_acl_mode) : ((i)-i_mode)) - static inline size_t f2fs_acl_size(int count) { if (count = 4) { @@ -208,7 +205,6 @@ struct posix_acl *f2fs_get_acl(struct inode *inode, int type) static int f2fs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb); - struct f2fs_inode_info *fi = F2FS_I(inode); int name_index; void *value = NULL; size_t size = 0; @@ -226,9 +222,12 @@ static int
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote: > 2013/6/10, Changman Lee : > > Hello, Namjae > Hi. Changman. > > > > If using ACL, whenever i_mode is changed we should update acl_mode which > > is written to xattr block, too. And vice versa. > > Because update_inode() is called at any reason and anytime, so we should > > sync both the moment xattr is written. > > We don't hope that only i_mode is written to disk and xattr is not. So > > f2fs_setattr is dirty. > Yes, agreed this could be issue. > > > > And, below code has a bug. When error is occurred, inode->i_mode > > shouldn't be changed. Please, check one more time, Namjae. > And, below code has a bug. When error is occurred, inode->i_mode > shouldn't be changed. Please, check one more time, Namjae. > > This was part of the default code, when ‘acl’ is not set for file’ > Then, inode should be updated by these conditions (i.e., it covers the > ‘chmod’ and ‘setacl’ scenario). > When ACL is not present on the file and ‘chmod’ is done, then mode is > changed from this part, as f2fs_get_acl() will fail and cause the > below code to be executed: > > if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { > inode->i_mode = fi->i_acl_mode; > clear_inode_flag(fi, FI_ACL_MODE); > } > > Now, in order to make it consistent and work on all scenario we need > to make further change like this in addition to the patch changes. > setattr_copy(inode, attr); > if (attr->ia_valid & ATTR_MODE) { > + set_acl_inode(fi, inode->i_mode); > err = f2fs_acl_chmod(inode); > if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { > > Let me know your opinion. > Thanks. > setattr_copy changes inode->i_mode, this is not our expectation. So I made redundant __setatt_copy that copy attr->mode to fi->i_acl_mode. When acl_mode is reflected in xattr, acl_mode is copied to inode->i_mode. Agree? > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > index deefd25..29cd449 100644 > > --- a/fs/f2fs/file.c > > +++ b/fs/f2fs/file.c > > @@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct > > iattr *attr) > > > > if (attr->ia_valid & ATTR_MODE) { > > err = f2fs_acl_chmod(inode); > > - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { > > - inode->i_mode = fi->i_acl_mode; > > + if (err || is_inode_flag_set(fi, FI_ACL_MODE)) > > clear_inode_flag(fi, FI_ACL_MODE); > > - } > > } > > > > Thanks. > > > > > > On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote: > >> From: Namjae Jeon > >> > >> Remove the redundant code from this function and make it aligned with > >> usages of latest generic vfs layer function e.g using the setattr_copy() > >> instead of using the f2fs specific function. > >> > >> Also correct the condition for updating the size of file via > >> truncate_setsize(). > >> > >> Signed-off-by: Namjae Jeon > >> Signed-off-by: Pankaj Kumar > >> --- > >> fs/f2fs/acl.c |5 + > >> fs/f2fs/file.c | 47 +-- > >> 2 files changed, 6 insertions(+), 46 deletions(-) > >> > >> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c > >> index 44abc2f..2d13f44 100644 > >> --- a/fs/f2fs/acl.c > >> +++ b/fs/f2fs/acl.c > >> @@ -17,9 +17,6 @@ > >> #include "xattr.h" > >> #include "acl.h" > >> > >> -#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? > >> \ > >> - (F2FS_I(i)->i_acl_mode) : ((i)->i_mode)) > >> - > >> static inline size_t f2fs_acl_size(int count) > >> { > >>if (count <= 4) { > >> @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode) > >>struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb); > >>struct posix_acl *acl; > >>int error; > >> - umode_t mode = get_inode_mode(inode); > >> + umode_t mode = inode->i_mode; > >> > >>if (!test_opt(sbi, POSIX_ACL)) > >>return 0; > >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > >> index deefd25..8dfc1da 100644 > >> --- a/fs/f2fs/file.c > >> +++ b/fs/f2fs/file.c > >> @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt, >
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
On 화, 2013-06-11 at 07:57 +0900, Namjae Jeon wrote: 2013/6/10, Changman Lee cm224@samsung.com: Hello, Namjae Hi. Changman. If using ACL, whenever i_mode is changed we should update acl_mode which is written to xattr block, too. And vice versa. Because update_inode() is called at any reason and anytime, so we should sync both the moment xattr is written. We don't hope that only i_mode is written to disk and xattr is not. So f2fs_setattr is dirty. Yes, agreed this could be issue. And, below code has a bug. When error is occurred, inode-i_mode shouldn't be changed. Please, check one more time, Namjae. And, below code has a bug. When error is occurred, inode-i_mode shouldn't be changed. Please, check one more time, Namjae. This was part of the default code, when ‘acl’ is not set for file’ Then, inode should be updated by these conditions (i.e., it covers the ‘chmod’ and ‘setacl’ scenario). When ACL is not present on the file and ‘chmod’ is done, then mode is changed from this part, as f2fs_get_acl() will fail and cause the below code to be executed: if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { inode-i_mode = fi-i_acl_mode; clear_inode_flag(fi, FI_ACL_MODE); } Now, in order to make it consistent and work on all scenario we need to make further change like this in addition to the patch changes. setattr_copy(inode, attr); if (attr-ia_valid ATTR_MODE) { + set_acl_inode(fi, inode-i_mode); err = f2fs_acl_chmod(inode); if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { Let me know your opinion. Thanks. setattr_copy changes inode-i_mode, this is not our expectation. So I made redundant __setatt_copy that copy attr-mode to fi-i_acl_mode. When acl_mode is reflected in xattr, acl_mode is copied to inode-i_mode. Agree? diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index deefd25..29cd449 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) if (attr-ia_valid ATTR_MODE) { err = f2fs_acl_chmod(inode); - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { - inode-i_mode = fi-i_acl_mode; + if (err || is_inode_flag_set(fi, FI_ACL_MODE)) clear_inode_flag(fi, FI_ACL_MODE); - } } Thanks. On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote: From: Namjae Jeon namjae.j...@samsung.com Remove the redundant code from this function and make it aligned with usages of latest generic vfs layer function e.g using the setattr_copy() instead of using the f2fs specific function. Also correct the condition for updating the size of file via truncate_setsize(). Signed-off-by: Namjae Jeon namjae.j...@samsung.com Signed-off-by: Pankaj Kumar pankaj...@samsung.com --- fs/f2fs/acl.c |5 + fs/f2fs/file.c | 47 +-- 2 files changed, 6 insertions(+), 46 deletions(-) diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 44abc2f..2d13f44 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c @@ -17,9 +17,6 @@ #include xattr.h #include acl.h -#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \ - (F2FS_I(i)-i_acl_mode) : ((i)-i_mode)) - static inline size_t f2fs_acl_size(int count) { if (count = 4) { @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode) struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb); struct posix_acl *acl; int error; - umode_t mode = get_inode_mode(inode); + umode_t mode = inode-i_mode; if (!test_opt(sbi, POSIX_ACL)) return 0; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index deefd25..8dfc1da 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt, return 0; } -#ifdef CONFIG_F2FS_FS_POSIX_ACL -static void __setattr_copy(struct inode *inode, const struct iattr *attr) -{ - struct f2fs_inode_info *fi = F2FS_I(inode); - unsigned int ia_valid = attr-ia_valid; - - if (ia_valid ATTR_UID) - inode-i_uid = attr-ia_uid; - if (ia_valid ATTR_GID) - inode-i_gid = attr-ia_gid; - if (ia_valid ATTR_ATIME) - inode-i_atime = timespec_trunc(attr-ia_atime, - inode-i_sb-s_time_gran); - if (ia_valid ATTR_MTIME) - inode-i_mtime = timespec_trunc(attr-ia_mtime, - inode-i_sb-s_time_gran); - if (ia_valid ATTR_CTIME) - inode-i_ctime = timespec_trunc(attr-ia_ctime, - inode-i_sb-s_time_gran); - if (ia_valid ATTR_MODE) { - umode_t mode = attr-ia_mode
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
Hello, Namjae If using ACL, whenever i_mode is changed we should update acl_mode which is written to xattr block, too. And vice versa. Because update_inode() is called at any reason and anytime, so we should sync both the moment xattr is written. We don't hope that only i_mode is written to disk and xattr is not. So f2fs_setattr is dirty. And, below code has a bug. When error is occurred, inode->i_mode shouldn't be changed. Please, check one more time, Namjae. diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index deefd25..29cd449 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_MODE) { err = f2fs_acl_chmod(inode); - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { - inode->i_mode = fi->i_acl_mode; + if (err || is_inode_flag_set(fi, FI_ACL_MODE)) clear_inode_flag(fi, FI_ACL_MODE); - } } Thanks. On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote: > From: Namjae Jeon > > Remove the redundant code from this function and make it aligned with > usages of latest generic vfs layer function e.g using the setattr_copy() > instead of using the f2fs specific function. > > Also correct the condition for updating the size of file via > truncate_setsize(). > > Signed-off-by: Namjae Jeon > Signed-off-by: Pankaj Kumar > --- > fs/f2fs/acl.c |5 + > fs/f2fs/file.c | 47 +-- > 2 files changed, 6 insertions(+), 46 deletions(-) > > diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c > index 44abc2f..2d13f44 100644 > --- a/fs/f2fs/acl.c > +++ b/fs/f2fs/acl.c > @@ -17,9 +17,6 @@ > #include "xattr.h" > #include "acl.h" > > -#define get_inode_mode(i)((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \ > - (F2FS_I(i)->i_acl_mode) : ((i)->i_mode)) > - > static inline size_t f2fs_acl_size(int count) > { > if (count <= 4) { > @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode) > struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb); > struct posix_acl *acl; > int error; > - umode_t mode = get_inode_mode(inode); > + umode_t mode = inode->i_mode; > > if (!test_opt(sbi, POSIX_ACL)) > return 0; > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index deefd25..8dfc1da 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt, > return 0; > } > > -#ifdef CONFIG_F2FS_FS_POSIX_ACL > -static void __setattr_copy(struct inode *inode, const struct iattr *attr) > -{ > - struct f2fs_inode_info *fi = F2FS_I(inode); > - unsigned int ia_valid = attr->ia_valid; > - > - if (ia_valid & ATTR_UID) > - inode->i_uid = attr->ia_uid; > - if (ia_valid & ATTR_GID) > - inode->i_gid = attr->ia_gid; > - if (ia_valid & ATTR_ATIME) > - inode->i_atime = timespec_trunc(attr->ia_atime, > - inode->i_sb->s_time_gran); > - if (ia_valid & ATTR_MTIME) > - inode->i_mtime = timespec_trunc(attr->ia_mtime, > - inode->i_sb->s_time_gran); > - if (ia_valid & ATTR_CTIME) > - inode->i_ctime = timespec_trunc(attr->ia_ctime, > - inode->i_sb->s_time_gran); > - if (ia_valid & ATTR_MODE) { > - umode_t mode = attr->ia_mode; > - > - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID)) > - mode &= ~S_ISGID; > - set_acl_inode(fi, mode); > - } > -} > -#else > -#define __setattr_copy setattr_copy > -#endif > - > int f2fs_setattr(struct dentry *dentry, struct iattr *attr) > { > struct inode *inode = dentry->d_inode; > - struct f2fs_inode_info *fi = F2FS_I(inode); > int err; > > err = inode_change_ok(inode, attr); > if (err) > return err; > > - if ((attr->ia_valid & ATTR_SIZE) && > - attr->ia_size != i_size_read(inode)) { > - truncate_setsize(inode, attr->ia_size); > + if ((attr->ia_valid & ATTR_SIZE)) { > + if (attr->ia_size != i_size_read(inode)) > + truncate_setsize(inode, attr->ia_size); > f2fs_truncate(inode); > f2fs_balance_fs(F2FS_SB(inode->i_sb)); > } > > - __setattr_copy(inode, attr); > + setattr_copy(inode, attr); > > - if (attr->ia_valid & ATTR_MODE) { > + if (attr->ia_valid & ATTR_MODE) > err = f2fs_acl_chmod(inode); > - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { > - inode->i_mode = fi->i_acl_mode; > - clear_inode_flag(fi, FI_ACL_MODE); > - } > - } > > mark_inode_dirty(inode); > return
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
Hello, Namjae If using ACL, whenever i_mode is changed we should update acl_mode which is written to xattr block, too. And vice versa. Because update_inode() is called at any reason and anytime, so we should sync both the moment xattr is written. We don't hope that only i_mode is written to disk and xattr is not. So f2fs_setattr is dirty. And, below code has a bug. When error is occurred, inode-i_mode shouldn't be changed. Please, check one more time, Namjae. diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index deefd25..29cd449 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -352,10 +352,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) if (attr-ia_valid ATTR_MODE) { err = f2fs_acl_chmod(inode); - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { - inode-i_mode = fi-i_acl_mode; + if (err || is_inode_flag_set(fi, FI_ACL_MODE)) clear_inode_flag(fi, FI_ACL_MODE); - } } Thanks. On 토, 2013-06-08 at 21:25 +0900, Namjae Jeon wrote: From: Namjae Jeon namjae.j...@samsung.com Remove the redundant code from this function and make it aligned with usages of latest generic vfs layer function e.g using the setattr_copy() instead of using the f2fs specific function. Also correct the condition for updating the size of file via truncate_setsize(). Signed-off-by: Namjae Jeon namjae.j...@samsung.com Signed-off-by: Pankaj Kumar pankaj...@samsung.com --- fs/f2fs/acl.c |5 + fs/f2fs/file.c | 47 +-- 2 files changed, 6 insertions(+), 46 deletions(-) diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 44abc2f..2d13f44 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c @@ -17,9 +17,6 @@ #include xattr.h #include acl.h -#define get_inode_mode(i)((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \ - (F2FS_I(i)-i_acl_mode) : ((i)-i_mode)) - static inline size_t f2fs_acl_size(int count) { if (count = 4) { @@ -299,7 +296,7 @@ int f2fs_acl_chmod(struct inode *inode) struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb); struct posix_acl *acl; int error; - umode_t mode = get_inode_mode(inode); + umode_t mode = inode-i_mode; if (!test_opt(sbi, POSIX_ACL)) return 0; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index deefd25..8dfc1da 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -300,63 +300,26 @@ static int f2fs_getattr(struct vfsmount *mnt, return 0; } -#ifdef CONFIG_F2FS_FS_POSIX_ACL -static void __setattr_copy(struct inode *inode, const struct iattr *attr) -{ - struct f2fs_inode_info *fi = F2FS_I(inode); - unsigned int ia_valid = attr-ia_valid; - - if (ia_valid ATTR_UID) - inode-i_uid = attr-ia_uid; - if (ia_valid ATTR_GID) - inode-i_gid = attr-ia_gid; - if (ia_valid ATTR_ATIME) - inode-i_atime = timespec_trunc(attr-ia_atime, - inode-i_sb-s_time_gran); - if (ia_valid ATTR_MTIME) - inode-i_mtime = timespec_trunc(attr-ia_mtime, - inode-i_sb-s_time_gran); - if (ia_valid ATTR_CTIME) - inode-i_ctime = timespec_trunc(attr-ia_ctime, - inode-i_sb-s_time_gran); - if (ia_valid ATTR_MODE) { - umode_t mode = attr-ia_mode; - - if (!in_group_p(inode-i_gid) !capable(CAP_FSETID)) - mode = ~S_ISGID; - set_acl_inode(fi, mode); - } -} -#else -#define __setattr_copy setattr_copy -#endif - int f2fs_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = dentry-d_inode; - struct f2fs_inode_info *fi = F2FS_I(inode); int err; err = inode_change_ok(inode, attr); if (err) return err; - if ((attr-ia_valid ATTR_SIZE) - attr-ia_size != i_size_read(inode)) { - truncate_setsize(inode, attr-ia_size); + if ((attr-ia_valid ATTR_SIZE)) { + if (attr-ia_size != i_size_read(inode)) + truncate_setsize(inode, attr-ia_size); f2fs_truncate(inode); f2fs_balance_fs(F2FS_SB(inode-i_sb)); } - __setattr_copy(inode, attr); + setattr_copy(inode, attr); - if (attr-ia_valid ATTR_MODE) { + if (attr-ia_valid ATTR_MODE) err = f2fs_acl_chmod(inode); - if (err || is_inode_flag_set(fi, FI_ACL_MODE)) { - inode-i_mode = fi-i_acl_mode; - clear_inode_flag(fi, FI_ACL_MODE); - } - } mark_inode_dirty(inode); return err; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message
Re: [PATCH 2/6] f2fs: move out f2fs_balance_fs from gc_thread_func
As you know, f2fs_balance_fs conducts gc if f2fs has not enough free sections. But, the purpose of background gc is to conduct gc during idle time without checking free sections so that f2fs can solve defragment condition. Could you review this? --> 8 -- >From fbda3262dac81c4f0d7ae8b9b757c820da593120 Mon Sep 17 00:00:00 2001 From: Changman Lee Date: Mon, 4 Feb 2013 10:05:09 +0900 Subject: [PATCH] f2fs: remove unnecessary gc option check and balance_fs 1. If f2fs is mounted with background_gc_off option, checking BG_GC is not redundant. 2. f2fs_balance_fs is checked in f2fs_gc, so this is also redundant. Signed-off-by: Changman Lee Signed-off-by: Namjae Jeon Signed-off-by: Amit Sahrawat --- fs/f2fs/gc.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 8fe43f3..e5c47f6 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -49,11 +49,6 @@ static int gc_thread_func(void *data) continue; } - f2fs_balance_fs(sbi); - - if (!test_opt(sbi, BG_GC)) - continue; - /* * [GC triggering condition] * 0. GC is not conducted currently. @@ -96,6 +91,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi) { struct f2fs_gc_kthread *gc_th; + if (!test_opt(sbi, BG_GC)) + return 0; gc_th = kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL); if (!gc_th) return -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] f2fs: move out f2fs_balance_fs from gc_thread_func
As you know, f2fs_balance_fs conducts gc if f2fs has not enough free sections. But, the purpose of background gc is to conduct gc during idle time without checking free sections so that f2fs can solve defragment condition. Could you review this? -- 8 -- From fbda3262dac81c4f0d7ae8b9b757c820da593120 Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Mon, 4 Feb 2013 10:05:09 +0900 Subject: [PATCH] f2fs: remove unnecessary gc option check and balance_fs 1. If f2fs is mounted with background_gc_off option, checking BG_GC is not redundant. 2. f2fs_balance_fs is checked in f2fs_gc, so this is also redundant. Signed-off-by: Changman Lee cm224@samsung.com Signed-off-by: Namjae Jeon namjae.j...@samsung.com Signed-off-by: Amit Sahrawat a.sahra...@samsung.com --- fs/f2fs/gc.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 8fe43f3..e5c47f6 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -49,11 +49,6 @@ static int gc_thread_func(void *data) continue; } - f2fs_balance_fs(sbi); - - if (!test_opt(sbi, BG_GC)) - continue; - /* * [GC triggering condition] * 0. GC is not conducted currently. @@ -96,6 +91,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi) { struct f2fs_gc_kthread *gc_th; + if (!test_opt(sbi, BG_GC)) + return 0; gc_th = kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL); if (!gc_th) return -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> -Original Message- > From: Namjae Jeon [mailto:linkinj...@gmail.com] > Sent: Wednesday, October 17, 2012 8:14 PM > To: Changman Lee > Cc: Jaegeuk Kim; Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; > ty...@mit.edu; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org; > chur@samsung.com; cm224@samsung.com; jooyoung.hw...@samsung.com; > linux-fsde...@vger.kernel.org > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system > > 2012/10/11, Changman Lee : > > 2012년 10월 11일 목요일에 Namjae Jeon님이 작성: > >> 2012/10/10 Jaegeuk Kim : > >> > >>>> > >>>> I mean that every volume is placed inside any partition (MTD or GPT). > > Every partition begins from any > >>>> physical sector. So, as I can understand, f2fs volume can begin from > > physical sector that is laid > >>>> inside physical erase block. Thereby, in such case of formating the > > f2fs's operation units will be > >>>> unaligned in relation of physical erase blocks, from my point of view. > > Maybe, I misunderstand > >>>> something but it can lead to additional FTL operations and performance > > degradation, from my point of > >>>> view. > >>> > >>> I think mkfs already calculates the offset to align that. > >> I think this answer is not what he want. > >> If you don't use partition table such as dos partition table or gpt, I > >> think that it is possible to align using mkfs. > >> But If we should consider partition table space in storage, I don't > >> understand how it could be align using mkfs. > >> > >> Thanks. > > > > We can know the physical starting sector address of any partitions from > > hdio geometry information got by ioctl. > If so, first block and end block of partition are useless ? > > Thanks. For example. If we try to align a start point of F2FS in 2MB but start sector of any partition is not aligned in 2MB, and of course F2FS will have some unused blocks. Instead, F2FS could reduce gc cost of ftl. I don't know my answer is what you want. > > > >>> Thanks, > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > > in > >>> the body of a message to majord...@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" > >> in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
-Original Message- From: Namjae Jeon [mailto:linkinj...@gmail.com] Sent: Wednesday, October 17, 2012 8:14 PM To: Changman Lee Cc: Jaegeuk Kim; Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; ty...@mit.edu; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org; chur@samsung.com; cm224@samsung.com; jooyoung.hw...@samsung.com; linux-fsde...@vger.kernel.org Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system 2012/10/11, Changman Lee cm224@gmail.com: 2012년 10월 11일 목요일에 Namjae Jeonlinkinj...@gmail.com님이 작성: 2012/10/10 Jaegeuk Kim jaegeuk@samsung.com: I mean that every volume is placed inside any partition (MTD or GPT). Every partition begins from any physical sector. So, as I can understand, f2fs volume can begin from physical sector that is laid inside physical erase block. Thereby, in such case of formating the f2fs's operation units will be unaligned in relation of physical erase blocks, from my point of view. Maybe, I misunderstand something but it can lead to additional FTL operations and performance degradation, from my point of view. I think mkfs already calculates the offset to align that. I think this answer is not what he want. If you don't use partition table such as dos partition table or gpt, I think that it is possible to align using mkfs. But If we should consider partition table space in storage, I don't understand how it could be align using mkfs. Thanks. We can know the physical starting sector address of any partitions from hdio geometry information got by ioctl. If so, first block and end block of partition are useless ? Thanks. For example. If we try to align a start point of F2FS in 2MB but start sector of any partition is not aligned in 2MB, and of course F2FS will have some unused blocks. Instead, F2FS could reduce gc cost of ftl. I don't know my answer is what you want. Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/